Link Equity and Crawl Efficiency Maintenance

When conducting link audits, people usually look for risks rather than opportunities. By paying attention to what happens to Googlebot crawling external links to your site, you can usually find multiple ways to retain link equity and rank better as a result.

This post should be useful for anyone working with big websites or old websites – especially those that have been subject to any of the following:

Many businesses spend a great deal of time producing something (anything) in order to gain links to their websites. Return varies with each website – but if the site has real world stores, or is big enough to be considered a brand in its own right, then the time you spend on optimising the crawl path of historic external links is likely to have greater return than multiple successful creative projects in terms of link equity. Cost is typically low, while benefit is typically high (and for some sites, staggering). The only real requirements are the ability to:

  • Change redirect paths
  • Update internal links
  • Implement & consolidate redirect rules

This is typically within the ability of most development teams, though you may need to get your hands dirty mapping redirects.

A few entrenched (fairly safe) assumptions before we begin:

Although we’re considering this mostly from a link equity standpoint, there are knock-on advantages to be had from tightening up crawl budget, so you should think broadly in terms of application. This highly accurate, completely scientific table illustrates the point:

Even an optimistic view makes “externally linked to chained redirects” worth resolving. I’m a pessimist. To summarise:

  • One redirect:                  Good.
  • More redirects:              Bad.
  • No redirects:                  Best.

 Example:

A client had previously migrated from .co.uk to .com. Following a replatform and a URL structure change, about half of the external links to the site from before 2011 were being redirected four times on average. The rest were resulting in server errors (and thus not passing any value). This same client had historically used search pages as impromptu category pages. These acquired links. Their CMS provider had later implemented a useful feature to automatically redirect any search for an exact product name to the appropriate product page, via 302. Knowing these ancient links were not passing value allowed us to update this rule to a 301 redirect, causing the associated products to rank better as a result.

The older a site is, the more of these inefficiencies it is likely to have acquired, and the bigger the impact of addressing them will be.

Method:

To find the problems, you’ll need a list of all the links to your website, and all the linking destinations. You can do this with standard exports from most link data providers, but you may need to do some crawling yourself if you want a more complete picture (recommended).

When you have this you’ll want a list mode Screaming Frog crawl of all linking destinations on the client site or any former properties. Be sure to ensure Screaming Frog is set to always follow redirects and ignore robots.txt. You should then make use of the recently added redirect chains report. In the good old days we had to run the same report over and over.

You can do this all in SEOtools for Excel (for free), but I’d strongly recommend using Screaming Frog in this instance as it’s faster and less fiddly for this set of tasks.

What exactly are we looking for?

What we are trying to do is ensure that Googlebot can crawl unhindered from external link to canonical page in as few steps as possible. To do this, we look for anything that interrupts crawl or link equity. To start:

  • Links to soft 404 pages.
  • Links that result in 4xx, or 5xx responses.
  • Other kinds of redirection (meta-refresh etc).
  • External Links resulting in redirection to non-canonical pages.
  • Chains of 301 redirects.
  • Chains of 301 redirects greater than 5.
  • Links resulting in 302 redirects or redirect chains containing a 302.
  • Redirects to or via resources blocked in robots.txt
  • Redirect loops.

Think about general guidelines for migrations and URL structures etc – they tend towards ‘make sure current structure is redirected to the new structure in a single hop’. This is fine in isolation, but every time new recommendations are put in place without consideration of past migrations, you’re guaranteeing link equity loss. Very often you’ll find that a number of structural changes have happened in the past without being handled quite correctly, leading to an abundance of inefficiencies.

Canonicals:

Treat canonicals as though they were an additional hop for link equity purposes. As such, you’ll want to ensure all the final destinations contain self-referencing canonicals. If a redirect chain results in a live non-canonical page, you should strongly consider updating this to refer directly to the canonical. To do this use SEOtools for Excel’s “=htmlcanonical” or a Screaming Frog crawl, followed by “=exact” to determine if any of the final destinations are non-canonical resources.

Links to Blocked Resources:

Unblock them, create exceptions, or redirect them where appropriate. The crawl efficiency/link equity tradeoff is something you’ll need to calculate case-by-case (and probably the topic for a future post).

Redirect Chains:

Solving this can be as simple as updating the first URL in a chain to the destination of the last. From that redirect chain report you can remove the intermediate columns to provide a basic initial mapping. To be thorough, you should redirect every step in the chain to the destination of the last step in the chain.

As a rule of thumb: Remove any intermediate steps so that only a single redirect is used.

Generally the biggest gains will be had in the rules that generate redirect chains, rather than individual mappings.

Rule Consolidation:

We like to recommend nice consistent URL structures, complete with redirect rules to consolidate link equity, catch duplicate content, look pretty, and make our lives a little easier. You may use something like the following:

  • Enforce lowercase
  • Enforce trailing slash
  • Enforce www./non-www.
  • Enforce https

The link equity issue can arise when these rules are applied sequentially rather than all at once. Let’s say the Guardian links to my website (thanks), but I’ve since set up a number of redirect rules to ensure my URLs look just how I want them:

  • Link                      http://www.ohgm.co.uk/BLOG-POST
  • Redirect #1           http://www.ohgm.co.uk/blog-post
  • Redirect #2           http://www.ohgm.co.uk/blog-post/
  • Redirect #3           http://ohgm.co.uk/blog-post/
  • Redirect #4          https://ohgm.co.uk/blog-post/

How much link equity (if any) do you think makes it through that series of rules?

This is one of the main reasons people might see lacklustre returns following a ‘by-the-book’ URL structure change. Yes, this can be a pain to fix – it might require taking 3 simple rules and transforming them into several complex rules, but this is where the biggest returns are to be had.

Avoid Blanket Redirection:

Once you have a list of all the links & destinations that are failing to pass value, you have the ability to be selective in which you reinstate/optimise. If an older version of the site used to be rather aggressive with certain anchor texts or paid placements, you may decide against it. If a site has previously had link-based penalties you should tread carefully. Some pages may not be passing value for a reason, so please keep this in mind before blanket redirecting every 4xx page.

There’s some healthy debate to be had around this issue (whether chained 301s can protect you from penalties whilst passing value from bad links), but if there’s a time to be selective, it’s now.

Realignment:

If the best state of affairs is zero redirects, then you may think realignment is the way to go. Maybe. However, you won’t successfully realign many links if there are redirects already in place. The webmaster you’re asking to do the realign will click the link, see it works, and decide you don’t know what you’re talking about.

If you first kill any redirects in place, respond with a 404 page and then contact the linking webmaster to see if they’ll realign the link, you’ll have more success. You might feel like you need to wash afterwards, though. I don’t tend to believe link realignment is a good use of time initially. Redirect now, realign later.

Repeat:

I think as a kind of periodic maintenance for sites with history, this kind of work is invaluable, and it makes your domain more resistant to equity loss as the result of further changes in the future. Recommended.



Stay Updated: Sign Up for Webinar & New Blog Alerts

7 thoughts on “Link Equity and Crawl Efficiency Maintenance

  1. Jon says:

    Excellent post mate. Although

    http://www.avant8.com.au/seo-vidoes/what-percentage-of-pagerank-is-lost-through-a-301-redirect/

    Cutts came out recently to say there is no equity lost. Despite saying a while ago (with Eric Engel I think) that there was some loss.

  2. Jim says:

    @jon

    How on earth can you watch that video and come away with, “Cutts came out recently to say there is no equity lost [as a result of 301 redirects].” That is NOT AT ALL what Matt said in the video. He said there is no more PageRank lost as a result of a 301 than is lost as a result of a link (more specifically, a single link). In fact he said that currently, EXACTLY the same amount of PageRank is lost as a result of a 301 as is lost on a single link.

    If you have every studied The Anatomy of a Large-Scale Hypertextual Web Search Engine (http://infolab.stanford.edu/~backrub/google.html)… i.e. Larry Page and Sergey Brin’s Stanford whitepaper that essentially became the blueprint for Google (or their original PageRank patents), in section 2.1 of the whitepaper they explain the original PageRank formula. In the explanation they explain how the “damping factor” works which results in the 10-15% loss that Matt refers to in the video you referenced.

    So essentially, if the damping factor is 15% and the total PageRank that a page has accumulated through its inbound links is X, then only (0.85 * X) PageRank is passed out of the page. There is, in this case, a 15% loss. So if the page has 5 outbound links (then all links were followed) then each outbound link would be passed ((0.85 * X) / 5) PageRank. If that page has only one outbound link then the amount of PageRank passed out on that one link is ((0.85 * X) / 1) or simply (0.85 * X).

    If you think about what a redirect is, it is essentially a page with one outbound link. If PageA links to PageB and PageB 301 redirects to PageC then the amount of PageRank lost is EXACTLY the same as if PageA links to PageB and PageB has a single outbound link on it pointing to PageC.

    And what he said in the current video is EXACTLY what he eluded to back in 2010 when Eric Enge interviewed him (actually not in the interview because he wanted to double check with the developers… but he confirmed it in a follow up email to Eric).

    It is the Damping Factor in the PageRank formula that is responsible for BOTH the PageRank decay that results with each link AND the PageRank decay that results from a 301 redirect.

  3. Andrea Moro says:

    Nice piece, but can you please be more precise when you say “Screaming Frog crawl, followed by “=match””. Where should that “match” be added?

  4. Hey buddy, you know it’s in Excel :D

    To clarify, =match() is a function in Excel that reports a true/false when comparing two points of data in any two cells:

    http://office.microsoft.com/en-gb/excel-help/match-function-HP010062414.aspx

  5. Oliver Mason says:

    Hi Andrea – thanks for the comment:

    “=match(canonicalURL,actualURL)” should be used in Excel to check whether the URL encountered at crawl is the same as the canonical URL.

  6. Keith F says:

    Really liked that you pointed out the issue regarding capitalized letters in the URL. I continually run into this issue.

    I also liked your comment Jim. I have seen traffic drops from 20%-35% from doing redirects of pages to new URLS. That 15% number would have been great!

  7. Oliver Mason says:

    Updated –
    I’d put “=match” in the post, when I meant “=exact”.

Comments are closed.