SEO | Technical

Preserving link equity and managing crawl efficiency for better SEO

by on 17th June 2017

When conducting link audits, people usually look for risks rather than opportunities. By paying attention to what happens to Googlebot crawling external links to your site, you can usually find multiple ways to retain link equity and rank better as a result.

This post should be useful for anyone working with big websites or old websites – especially those that have been subject to any of the following:

  • Migrations and replatforms
  • Internationalisation
  • Domain consolidation
  • Redirected subdomains
  • URL structure changes
  • A change to https
  • Successive redirection rules

Many businesses spend a great deal of time producing something (anything) in order to gain links to their websites. Return varies with each website – but if the site has real world stores, or is big enough to be considered a brand in its own right, then the time you spend on optimising the crawl path of historic external links is likely to have greater return than multiple successful creative projects in terms of link equity. Cost is typically low, while benefit is typically high (and for some sites, staggering). The only real requirements are the ability to:

  • Change redirect paths
  • Update internal links
  • Implement & consolidate redirect rules

This is typically within the ability of most development teams, though you may need to get your hands dirty mapping redirects.

A few entrenched (fairly safe) assumptions before we begin:

  • 301 redirects, while great, are a ‘lossy’ way to transfer link equity. This loss compounds.
  • Googlebot loses interest after five redirects.

Although we’re considering this mostly from a link equity standpoint, there are knock-on advantages to be had from tightening up crawl budget, so you should think broadly in terms of application. This highly accurate, completely scientific table illustrates the point:

Even an optimistic view makes “externally linked to chained redirects” worth resolving. I’m a pessimist. To summarise:

  • One redirect: Good.
  • More redirects: Bad.
  • No redirects:  Best.

 Example:

A client had previously migrated from .co.uk to .com. Following a replatform and a URL structure change, about half of the external links to the site from before 2011 were being redirected four times on average. The rest were resulting in server errors (and thus not passing any value). This same client had historically used search pages as impromptu category pages. These acquired links. Their CMS provider had later implemented a useful feature to automatically redirect any search for an exact product name to the appropriate product page, via 302. Knowing these ancient links were not passing value allowed us to update this rule to a 301 redirect, causing the associated products to rank better as a result.

The older a site is, the more of these inefficiencies it is likely to have acquired, and the bigger the impact of addressing them will be.

Method:

To find the problems, you’ll need a list of all the links to your website, and all the linking destinations. You can do this with standard exports from most link data providers, but you may need to do some crawling yourself if you want a more complete picture (recommended).

When you have this you’ll want a list mode Screaming Frog crawl of all linking destinations on the client site or any former properties. Be sure to ensure Screaming Frog is set to always follow redirects and ignore robots.txt. You should then make use of the recently added redirect chains report. In the good old days we had to run the same report over and over.

You can do this all in SEOtools for Excel (for free), but I’d strongly recommend using Screaming Frog in this instance as it’s faster and less fiddly for this set of tasks.

What exactly are we looking for?

What we are trying to do is ensure that Googlebot can crawl unhindered from external link to canonical page in as few steps as possible. To do this, we look for anything that interrupts crawl or link equity. To start:

  • Links to soft 404 pages.
  • Links that result in 4xx, or 5xx responses.
  • Other kinds of redirection (meta-refresh etc).
  • External Links resulting in redirection to non-canonical pages.
  • Chains of 301 redirects.
  • Chains of 301 redirects greater than 5.
  • Links resulting in 302 redirects or redirect chains containing a 302.
  • Redirects to or via resources blocked in robots.txt
  • Redirect loops.

Think about general guidelines for migrations and URL structures etc – they tend towards ‘make sure current structure is redirected to the new structure in a single hop’. This is fine in isolation, but every time new recommendations are put in place without consideration of past migrations, you’re guaranteeing link equity loss. Very often you’ll find that a number of structural changes have happened in the past without being handled quite correctly, leading to an abundance of inefficiencies.

Canonicals:

Treat canonicals as though they were an additional hop for link equity purposes. As such, you’ll want to ensure all the final destinations contain self-referencing canonicals. If a redirect chain results in a live non-canonical page, you should strongly consider updating this to refer directly to the canonical. To do this use SEOtools for Excel’s “=htmlcanonical” or a Screaming Frog crawl, followed by “=exact” to determine if any of the final destinations are non-canonical resources.

Links to Blocked Resources:

Unblock them, create exceptions, or redirect them where appropriate. The crawl efficiency/link equity tradeoff is something you’ll need to calculate case-by-case (and probably the topic for a future post).

Redirect Chains:

Solving this can be as simple as updating the first URL in a chain to the destination of the last. From that redirect chain report you can remove the intermediate columns to provide a basic initial mapping. To be thorough, you should redirect every step in the chain to the destination of the last step in the chain.

As a rule of thumb: Remove any intermediate steps so that only a single redirect is used.
Generally the biggest gains will be had in the rules that generate redirect chains, rather than individual mappings.

Rule Consolidation:

We like to recommend nice consistent URL structures, complete with redirect rules to consolidate link equity, catch duplicate content, look pretty, and make our lives a little easier. You may use something like the following:

  • Enforce lowercase
  • Enforce trailing slash
  • Enforce www./non-www.
  • Enforce https

The link equity issue can arise when these rules are applied sequentially rather than all at once. Let’s say the Guardian links to my website (thanks), but I’ve since set up a number of redirect rules to ensure my URLs look just how I want them:

  • Link                      http://www.ohgm.co.uk/BLOG-POST
  • Redirect #1           http://www.ohgm.co.uk/blog-post
  • Redirect #2           http://www.ohgm.co.uk/blog-post/
  • Redirect #3           http://ohgm.co.uk/blog-post/
  • Redirect #4          https://ohgm.co.uk/blog-post/

How much link equity (if any) do you think makes it through that series of rules?

This is one of the main reasons people might see lacklustre returns following a ‘by-the-book’ URL structure change. Yes, this can be a pain to fix – it might require taking 3 simple rules and transforming them into several complex rules, but this is where the biggest returns are to be had.

Avoid Blanket Redirection:

Once you have a list of all the links and destinations that are failing to pass value, you have the ability to be selective in which you reinstate/optimise. If an older version of the site used to be rather aggressive with certain anchor texts or paid placements, you may decide against it. If a site has previously had link-based penalties you should tread carefully. Some pages may not be passing value for a reason, so please keep this in mind before blanket redirecting every 4xx page.

There’s some healthy debate to be had around this issue (whether chained 301s can protect you from penalties whilst passing value from bad links), but if there’s a time to be selective, it’s now.

Realignment:

If the best state of affairs is zero redirects, then you may think realignment is the way to go. Maybe. However, you won’t successfully realign many links if there are redirects already in place. The webmaster you’re asking to do the realign will click the link, see it works, and decide you don’t know what you’re talking about.

If you first kill any redirects in place, respond with a 404 page and then contact the linking webmaster to see if they’ll realign the link, you’ll have more success. You might feel like you need to wash afterwards, though. I don’t tend to believe link realignment is a good use of time initially. Redirect now, realign later.

Repeat:

I think as a kind of periodic maintenance for sites with history, this kind of work is invaluable, and it makes your domain more resistant to equity loss as the result of further changes in the future. Recommended.

We're hiring – check out our careers page Careers

Get insights straight to your inbox

Stay one step ahead of the competition with our monthly Inner Circle email full of resources, industry developments and opinions from around the web.