How to identify lost links

by on 31st October 2011

Most people are naturally loss-averse. The common thinking is that we strongly prefer to avoid losses than consume energy in the pursuit of gain. In the case of links, the practicalities are such that reliable link monitoring is no easy task. Using tools like Buzzstream, we can keep an eye on the links we’ve built, but what about the legacy stuff that you simply assume will stick around?

In the past 6 months, we’ve worked hard to rescue a client’s site that fell victim to a vile SEO tactic. Their former agency owned a large network of links and had been acquiring sites and domains specifically for the purpose of building links to the client domain. Guess what happened when the agency was fired? A dramatic reduction in linking IP C Blocks and root domain links over the following months.

SEO’s – you’ve got to keep an eye on your link data, all of the time and especially when you’re working with a new client.

Looking at link data is looking at the past

Astrophysicists marvel at the joy of catching light photons in their telescopes created by a star many, many millions of years before they were born. SEO people need the freshest data to make decisions, but frequently forget that while they’re trawling through link data, they’re looking back at an internet from the past.

How do you know the link you’re observing in the data is still there? You have to manually check, build a tool, use a free tool or script. One thing’s for sure – pages get lost, links decay, they can be pulled from right under your feet or errors occur:

Httparchive’s 17k site crawl generated from top sites on Quantcast and Alexa data (amongst other sources)

44 billion page webcrawl from SEOmoz

At least 6% of Linkscape’s web crawl was 404 or unreachable back in late 2009 and the more recent updates show around a 9% decay.

Checking your links are still live from Majestic and Linkscape Data

For such a “quick tip” I’ve probably gone off on a bit of a tangent, but you’re still reading, right? At this stage I’ll give SEOdoctor a shout for this great post, and a tip found in the comments after a tip from Weip  to use – and now, I’ll carry on.

For my tip to work you’ll need to have Niel’s SEO Tools extension installed. Frankly if you’re not using it you should really start, it’s amazing.

Check a link can be found on a page with XPath and an IF statement

There’s a function in SEO Tools that allows the use of XPath in an Excel formula (take that, GDocs users!). It’s called XPathonURL, and it’s beautiful. So, if you can fetch the XPath for all href attributes matching a certain (domain) name, from a page, you’ll be able to check whether  a link is still live with a simple IF statement.

Here’s one I made earlier:

here's one I made earlier

Here’s how:

Just make sure you’re looking for the right domain (in this case, and that your cell reference for the inbound link (in this case, C2) is correct. That’s about it!

Just a note on XpathonURL

XpathOnURL doesn’t return a value if there is nothing to return – a blank cell. That’s why I’ve used two quotation marks. If the result from my query is blank, assume the condition is met in my IF statement and return a “not found”.

Save your historic data

Save your OSE / Linkscape downloads! Save them every month. If you’re not backing up your link data, you’re going to become dependent on the oldest, and most infrequently updated data sources. That’s ok, but it’s always more work to clean up. I tend to prefer directly comparing one data set (Linkscape to Linkscape) rather than scratching my head over Google WMT vs Majestic, or Linkscape vs Majestic. If you do, my best advice is create a master data table and de-dupe to create one big long list of all of your IBL’s. Then, get your analysis skills rocking.

We're hiring – check out our careers page Careers

Get insights straight to your inbox

Stay one step ahead of the competition with our monthly Inner Circle email full of resources, industry developments and opinions from around the web.