The Basics of JavaScript Framework SEO in AngularJS

Who is this Guide for?Web developers and Technical SEOs faced with or curious about optimising websites built in Angular and React for organic search.

Technologies for website application development such as Angular, React and Angular 2.0 are gradually proliferating throughout the front end web development landscape. A core understanding of this technology should be, in our opinion, a mandatory requirement in the advanced SEO skillset.

For SEO Consultants

We’ve written this article as an introductory guide to a technically savvy SEO who may be faced with answering questions about the “SEO friendliness” of websites built in Angular and React. As an SEO, if you’re comfortable with a standard level of technical site auditing, then you’ll probably be surprised by how easy it is to get up to speed with JS frameworks.

For Web Developers

If you’re a developer you may find you’d like to get into the detail more quickly. In each of our developer guides we advocate server side rendering for search engines. Thankfully, React and Angular 2.0 both support this feature natively leaving no further need for the “Ajax Crawl Directive” or 3rd Party Pre Rendering services. For the most upto date guidance on writing SEO friendly JS apps, take a look at this technical guide on Angular 2.0 / Universal Server Side Rendering or this guide for ReactJS.

Inevitably, of course, it may be you inherit a legacy platform that uses some sort of 3rd party rendering or a custom solution relying on the “?_escaped_fragment_=” parameter and the Ajax Crawl Directive. If that’s the case and you’re not likely to migrate any time soon, read on.

Image Source: AngularJS Usage (Builtwith) / ReactJS Usage (Builtwith) Frameworks such as Angular and React are gaining in popularity across Builtwith's Top 100k sites.

Frameworks such as Angular and React are gaining in popularity across Builtwith’s Top 100k sites.

Image Source: AngularJS Usage (Builtwith) / ReactJS Usage (Builtwith)

Fundamentals: What We Know About GoogleBot and JavaScript

Over the past year, Google have made progress in improving their JavaScript crawling capabilities. They’re essentially rendering the page as if in a browser, alluded to by Paul Haahr in his SMX Advanced presentation. Through our own work in JS framework content development and some useful input from elsewhere we’re able to derive some useful new facts about Google’s approach to JS and subsequently, derive some insight into our technical SEO strategy:

Googlebot follows JavaScript redirects

In fact, Google treats them very much like a 301 from an indexing standpoint. Indexed URLs are replaced in the index by the redirected-to URL.

Much of what we learn in React and Angular Development comes from test projects like this one or our own software development.

Much of what we learn in React and Angular Development comes from test projects like this one or our own software development.

Googlebot will follow links in JavaScript.

This includes JavaScript links coded with:

Despite Google’s apparent improvements at crawling JS, they still require everything that manipulates a URL to be a link, essentially in an “a” container. In tests on pagination, we’d supported paging using events on spans, which moved forward and backward through the list of events, and a pagination list at the bottom of the app, using similar logic.

Interestingly, though, whilst Google can render JS reasonably well, it’s not yet trying to understand the context of that JS.

When we checked the server logs, we were able to see that they weren’t attempting to poke the pagination list, despite that it looks like pagination (albeit at that point without links). What we’ve been able to infer from further testing of this is that while Google understands context from layout and common elements, it doesn’t yet try and fire JS events to see what will happen.

Anything that manipulates URLs needs to still be dealt with using links, with the link in the anchor of that link.

Anything else risks Google not assigning weight, or not crawling to the right page at all. this is potentially a situation that an inexperienced SEO might not spot and it’s a lot more common a solution for a JS developer than an SEO might prefer.

Googlebot has a maximum wait time for JS rendering that you should not ignore.

In our tests we found that content that took longer than 4 seconds to render or content that would not render until an event was fired by a link not in an “a” container, did not get indexed by Google.

Deprecating the Ajax Crawl Directive

Late in 2015, Google announced they were deprecating their “AJAX Crawl Directive”. We’ve since found that despite this announcement, Google still respect the directive (more on the implementation later in the article). Our position is that if you’re working on a site that uses the “?_escaped_fragment_=” parameter, then making sure the rendering functionality is still working well is an important first step.

Google can render and crawl JS sites. This, however is no guarantee that the outcome will be an SEO friendly, perfectly optimised site. Technical SEO expertise is needed, especially during testing. This will ensure that you have an SEO friendly site architecture, targeting the right keywords. A site that continues to rank despite the changes made in Google’s crawl.

In my opinion, it’s better to retain tighter control of what’s being rendered by serving Google a pre-rendered, flat version of your site. This way, all the classic rules of SEO apply and it should be easier to detect and diagnose potential SEO issues during testing.

If you’re building a site in Angular and you’d like it to be SEO friendly

    If you’re building in AngularJS and you’re concerned about what to do to make the site SEO friendly, you have a few choices.
  1. Follow the advice contained in this post on generating snapshots of your pages using Phantom.js and a custom cache layer. Make sure that each page has a friendly URL and serve a list of all of your URLs in a sitemap.xml file. Instead of serving the snapshots when the “?_escaped_fragment_=” parameter is included in the requested URL, serve the snapshot when the page is requested by a known search engine user agent like GoogleBot at the canonical URL. Provide a sitemap.xml file with all canonical URLs and submit to Google’s Search Console.
  2. Stick with the “?_escaped_fragment_=” directive. As of the 6th June 2016 Googlebot is still processing requests for this parameter. Provide a sitemap.xml file with all canonical URLs and submit to Google’s Search Console.
  3. Allow your AngularJS to be rendered by Google without pre-rendering and see what happens. Do use the HTML5 history API to update the visible URL in the browser without using a #! if you can possibly avoid it, and provide a sitemap.xml file with all canonical URLs and submit to Google’s Search Console. Most developers agree that #! are not ideal and they certainly add complexity to the SEO of a site.
  4. Don’t build anything new in Angular 1.x. Switch to Angular 2.0. Angular 2.0 works on the front end a lot more like React with regards to how it allows for server side rendering. You can use direct injection to talk to your components to allow for things like logged in states on the server, and to inject data for open graph and search purposes. This is our recommended approach if you absolutely must build in Angular.

If you’re building a site in ReactJS and you’d like it to be SEO friendly

We’ve written about ReactJS SEO here. The choices remain pretty much the same, serve the same content to a search engine as you would a user in a pre-rendered format. Use webpack or browserify to turn your JS into an npm module and run it on the server and client. The alternative is allow Googlebot to index the site on its own.

The Basics of JavaScript Framework SEO with AngularJS

Angular uses clever JavaScript rendering on the client. It features lots of ideas that are entirely new to us SEO folk (like “bi-directional data binding”), and it’s really powerful stuff for constructing web applications, fast.

Angular.js is a templating language that leaves much of the content rendering almost entirely down to the client (your browser). Take a look at this example:

 The website on the left in a non-JS enabled browser. On the right, JS enabled reveals all of the content.


The website on the left in a non-JS enabled browser. On the right, JS enabled reveals all of the content.

The classic advice has always been to make your content accessible to search engines by avoiding JavaScript only views. This view changed, of course meaning Search Engines can render JS heavy websites. The results however can be unreliable.

Think about a HTML web page (like the one you’re reading now). The HTML you’re viewing is a template, constructed and customised by the output of a few PHP files and a database lookup. The HTML itself is compiled on the host webserver whenever you request it, and then it’s served via http. Of course, if someone else has requested this page before, the chances are you’re reading a cached copy, built long before you knew this article even existed.

Right now you’re reading a web page that is in essence, a HTML file that’s been served by a web server. It was delivered after you’d asked for it via a http GET request, and now the deal is done. If you want to see another webpage, you can ask for that too and our web server will happily let you have it. If you want to interact with it, maybe you’ll complete a form and send a POST request. This is how the internet works.

That’s not quite what happens when you land on a web page built within a JS framework like Angular. Essentially, when you make a request to an Angular site, the content you see is a result of Javascript manipulating the DOM. Sure, there’s a lot of back and forth via http (using Angular’s $http service) but the client is doing most of the heavy lifting. The page rendering, the asynchronous exchange of data, content updates without a browser refresh, HTML template construction –  it’s all very clever.

Because of this, Angular belongs to a stack that JS developers love to work with because it’s fast and relatively easy to prototype an application. Side note: this stack is called MEAN: (Mongo, Express, Angular, Node)

Bizarrely, some web developers insist on building websites based in AngularJS when they don’t need to (they’re actually building a website not a web application), or they find themselves building their brochureware (FAQ’s, landing pages, about pages etc) in the same technology as the web application they’re hosting. In any case, if you build a website in a JavaScript framework, your SEO is going to suck out of the box, and you’re going to need to win friends in the engineering team to stand a chance of ranking for anything, ever.

Why?

Take a look at this AngularJS site. The content you can see is rendered in Javascript, and because of that, if you “view source” you’ll see there’s an unusually small amount of HTML (much less than you’d expect, which I’ll try to explain in a moment). Here’s an example:

Yes, that’s sometimes *all* of the content you’re going to get when you make a request to a site built in Angular.

That’s why you get the blank page when you visit the site with JS disabled in your browser. “ng-view” is a binding directive that makes the magic happen: instruct the Angular JS to begin manipulate the DOM with whatever content view it’s been bound to. It fills the container with content, basically.

So how does this affect your SEO? It’s pretty obvious you need a solution. Fortunately, there is one!

Making Heavily JS Dependent Websites Crawlable

It’s really important for us to understand how we can help search engines crawl JavaScript dependent websites. If you get this, you’re pretty much at the top of your technical SEO game.

Google and Bing support a directive that allows web developers to serve HTML snapshots of your JS heavy content, via a modified URL structure. Specifically, the “escaped fragment”, a parameter that replaces the hashbang (#!) in a web application URL.

That parameter looks like this:

So, imagine you’ve got a bunch of hashbangs in your Angular site:

A search crawler should recognise the #! and automatically request a new URL with our parameter included:

And provided your web server knows what to do with the modified request (i.e, serve a HTML rendered page), it’s all good.

Here’s how Google explains the process in it’s “Making Ajax Websites Crawlable” documentation:

crawlerserver2

Obviously you’ll need to ensure that HTML pre-fetching is running on your web server – this doesn’t just happen by accident!

Most engineers prefer to devise their own solution, most frequently using phantom.js (in my experience).

If you’re able to pre-render your site as HTML the only other thing you’ve got to be able to do before testing is make sure requests containing the escaped fragment parameter are routed to your HTML cache directory on your webserver – that’s a trivial challenge for a good web developer, a good example for which can be found here taken from this excellent article on Year of Moo.

Not Using Hashbangs in Your Angular Site?

If you look in the

of  this website, here’s what you’ll get:

You’ll either see it straight away or you won’t:

It’s tempting to jump to the conclusion that the developer of this project has a long way to go before their SEO works: the double curly brace notation {{ }} can be extremely misleading when you’re working through the source. That’s obviously not the case, this developer is very smart: http://www.redbullsoundselect.com/?_escaped_fragment_=  – see how the source code makes sense now?

This website doesn’t use hashbangs in the URL structure, so the the “meta fragment” declaration made in the <head> is an instruction to search engines to request the URL with the ?escaped_fragment_= parameter.

Side note: hashbangs *are* stupid. Don’t use them. Not if you don’t have to. Use Angular’s $location service to construct URLs without the #! via the HTML5 History API and save yourself a lifetime of misery. Trust me.

Testing Your Implementation

Once you’ve managed to test the pre-rendering service with a crawler, you’ll immediately be able to switch to “classic SEO” mode; in tha tyou’re looking out for the exact same types of problem you’d encounter on a standard / more typical HTML website.

Until recently there were few ways to actually crawl test an Angular site, though thankfully the team at Screamingfrog have built an AJAX crawler into their tool. The Ajax crawl respects the Ajax crawl directive meaning that the crawler will automatically request the URL with the additional escaped fragment parameter included.

To crawl a site with ScreamingFrog, start by downloading your sitemap.xml file and open the application. Switch to “List mode” and upload your XML file:

upload-sitemap

The tool will automatically parse the URLs from the sitemap, ready to crawl:

upload-sitemap2

From there you can run the standard crawl, being sure to spot any potential issues. At the time of writing, the main issue with this site is the escaped fragment parameters all return a 504 error:

screaming-frog-in-ajax-mode

Like I said, the main issue with these types of sites are maintaining the pre-rendering functionality. Whether you serve via the escaped fragment parameter or choose a different way to serve pre-rendered HTML files, there is much, much more infrastructure to manage. That’s why we advocate the use of ReactJS or Angular 2.0 Universal and much of the pre-rendering capabilities required are now baked in.

Resources / Further Reading

SEO for Universal Angular 2.0 – https://builtvisible.com/universal-angular-2-server-side-rendering-seo-crawl-friendliness/

Building Search Friendly Javascript Applications with React.js – https://builtvisible.com/react-js-seo/

https://webmasters.googleblog.com/2015/10/deprecating-our-ajax-crawling-scheme.html

AngularJS Developer Guide – http://docs.angularjs.org/guide

AngularJS and SEO (Year of Moo) – http://www.yearofmoo.com/2012/11/angularjs-and-seo.html

Getting Started with Angular – http://lostechies.com/gabrielschenker/2013/12/05/angularjspart-1/

PhantomJS Documentation – http://phantomjs.org/documentation/


Richard Baxter

Richard is the Founding member of digital agency, Builtvisible.com

Richard founded Builtvisible in September 2009. Since then he has grown a one man consulting startup agency into a 35 strong, award winning digital agency based in the heart of trendy Shoreditch, London.

Throughout the years Builtvisible has grown an excellent, industry accredited reputation for technical search (SEO), audience research and content marketing. Today, Builtvisible's services include a complete creative services capability offering web design, development and branding.

More about Richard

Richard brings extensive experience in digital marketing, having begun his career in search in 2003. Having worked in highly competitive verticals including travel, finance and retail throughout his career, Richard understands what is required to perform well in a crowded, competitive marketplace.

Away from Builtvisible, Richard is an investor in several technology agency and retail orientated startup companies.

With his remaining time, Richard races sports prototype cars, competing in the 2016 Radical SR1 Cup. If you want to follow his amateur racing career, find @richBracing on Twitter.


Get in touch for more information

Contact