A few weeks ago Eric Enge from Stone Temple invited me to participate in a “Majestic SEO vs Open Site Explorer” shootout. The questions and approach were solid, and interesting. From the perspective of a battle hardened SEO, how would each tool rate on update frequency, correlation with rankings, the quality of their APIs, etc?
Based on recent observation of some of SEOmoz’s challenges with scaling and subsequent index update delays, I rated them particularly low in the update frequency stakes. The perception, of course, would be that Majestic Fresh would be superior. I also *completely* guessed on the request to rate: “Percent of links reported by Moz & Majestic that still exist”.
I’m going to be completely honest. My ratings, comments and scoring were initially based on guestimates. Experienced, yes, but not satisfactory. My guesses were based on the sum total of experience gained through many different link analyses, SEO projects, consumed blog posts and marketing messages. None of these were based on a fair study comparing the tools, like for like, at the same time with the same data on the same domain. I felt pretty bad, therefore about the way I scored on each point.
On that note, I went and spent (probably far too much) time comparing the link data on SEOgadget.co.uk as provided by Majestic SEO (Fresh and Historic Indices), Open Site Explorer, Google Webmaster Tools, AHrefs and Searchmetrics. Then, I emailed Eric with my revised answers. It all felt, a lot fairer.
What’s contained in this blog post is (in my opinion) a more neutral comparision of the tools from the perspective of the data, it’s validity and total coverage. Now I’ve thought about the data I really feel the difference in how I understand how these tools work, their strengths and their weaknesses. It’s an extremely worthwhile thing to do, and I recommend you give it a go yourself sometime.
Something that I do feel important to note is this – crawling the URL data for the 100,000’s of URLs for this study was difficult enough. I can’t imagine the scale of the operation required to pull off a usable index of 150bn+ URLS. Frankly – anyone attempting to create an index of the web; and maintain it, deserves your respect. Many of the providers offer services beyond link analysis – Searchmetrics, for example may have the smallest link index size for SEOgadget in this study, but they’re newcomers to link data indexation and analysis compared to the others. Searchmetrics are (in my opinion) the best at search and social visibility data tracking. AHrefs have some very unique features and SEOmoz have a powerful site crawling tool, rank tracking service and keyword research tools. Google Webmaster Tools are, well they’re Google, aren’t they.
In my work, I downloaded a CSV export of all links to SEOgadget.co.uk from each of the tools. I used SEO tools for Excel on my desktop SEO workhorse to gather the latest server header responses for each URL and extracted the root domain from each to allow a count of the number of unique root domains in the data. This took considerable time, particularly in the larger datasets from Google Webmaster Tools and Majestic Historic. In brief, my key data points were:
– Number of rows in each export
– Duplication in data (by linking URL)
– Number of links reported
– Number of unique linking root domains
– Number of links live (200 or 301 / 302 redirect)
– Server Headers to analyse the composition of each index (expressed as 200, 301, 302, 4xx and 5xx)
– Coverage of all links (calculated by merging all root domains from every tool, de-duplicating and calculating a % of coverage each tool offers)
Totals Summary (Click to enlarge)
*small amend in this screenshot – there was an error in the #live links for Majestic Fresh – fixed in the rest of the article!
Who Provides the Most Data?
Data makes the world of serious SEO go-around. But who has the most? This is a count of the number of rows of data downloaded in each export, and the number of duplicates removed from each.
True to their word, Majestic have a seriously large index. We extracted 62944 rows of data, 17,725 of which were duplicate (by source URL only – there could be, for example, a separate row of data for every link on the same page). Thankfully, Google Webmaster Tools are great about allowing us to download all of the link data, even though they limit us to the first 1,000 errors found via their web crawler.
How Many Links and Unique Linking Root Domains are Reported?
Domain diversity is the most important aspect of serious link data analysis. How wide a net do each of these tools cast?
Again, Majestic’s Historic index rules the roost with the number of links reported, but the winner here is Google Webmaster Tools, for sheer domain diversity, their data seems to cast the widest net. Open Site Explorer reported a slightly higher number of root domain links(1696 from OSE, vs 1687 from Majestic Fresh). Note the vast drop off between the number of links reported by Majestic Fresh and the actual domain diversity in the data. Majestic tend to crawl deep on any one site, reducing the diversity of the data but increasing the sheer link count:
|PRODUCT||#Links Reported||#UNIQUE RDs|
How Many Links are Still Live?
We’ll come on to the composition of each of the tool’s indices later in the post, suffice it to say that tracking new and lost links is an emerging and important requirement from the SEO community. While their index is comparitively small, AHrefs are excellent at reporting on new and lost links through their user interface. As their index grows this will become a powerful tool.
I was pleasantly surprised by just how many links remain live (definition of live for this study is responds with 200 or 301/2 redirect) in the Majestic Historic Index. My perception had been that their index accumulates the data and contains, therefore, very large volumes of 4xx errors. It’s worth noting that around half of Majestic Fresh’s index responds with a 301 redirect. In a lot of cases this seems to be missing trailing slashes in the export data. I’ve observed that, while the root domain diversity of OSE and Majestic Fresh are very similar, Majestic Fresh tends to crawl more pages on each domain. This would explain the larger index count with similar levels of domain diversity in the data.
|PRODUCT||#UNIQUE RDs||#Links Live (200 + 3xx)|
Who Has the Best Coverage?
What tool casts the widest net? Is there such a thing as perfect when it comes to a link tool? Clearly, Google Webmaster Tools reports on the most diverse range of domains but does that mean it’s got the whole internet covered? No, none of them do. I took every single root domain reported by each tool and created a master list of 6,755 unique linking root domains pointing to SEOgadget. Of those, 5,682 of them are still live domains. Then I divided the number of reported linking root domains from each tool by the aggregate number to calculate a % of coverage. Obviously this assumes that the aggregated list represents the truly complete list of all domains that link to SEOgadget.
Google Webmaster Tools came out best, with Majestic Historic 2nd and Majestic Fresh and OSE a tied 3rd. AHrefs have done well for a small startup! Noone reports anything like 5,682 linking root domains to SEOgadget.
For the tool providers with the largest index sizes (GWMT, Majestic, OSE), here is a break down of the server header responses found in the data:
Index Composition – Google Webmaster Tools
|Response||Server Header Count|
Index Composition – Majestic Fresh and Historic
|Response||Server Header Count|
|Response||Server Header Count|
Index Composition – Open Site Explorer
|Response||Server Header Count|
Missing links in Fresh Index Iterations
As an interesting side note, I discovered an irregularity between two fresh index updates in Majestic. The updates were downloaded between the 24th June 2012 and the 27th June 2012. I found there were links missing by comparing the newer (larger) update to the older update. It’s really obvious that there’s an age related cut off point, where links older than a certain number of days (around 90?) get removed from the fresh index. At this point they’re added to the historic index (though it’s altogether possible they’re added much sooner than this and duplicated for a time).
I checked, and there were chunks of links missing from inside the 90 day window:
The red highlighted rows in the data table belong to a date range that no longer forms part of the fresh index. What’s really interesting is that the 26th (and onwards) exist in both exports – a like for like comparision reveals chunks of missind data.
I didn’t expect to find that links were missing from periods inside this update cycle – 280 to be exact, though small numbers on a daily basis. Of those, 149 responded with a 200 server header response, 103 were 301 or 302 and 27 were “proper” error codes.
To be fair, some data loss is hardly unavoidable – it took me long enough to crawl the 95,000 or so sum total links reported by all 4 tools! Crawling billions of URLs is no joke. It’s also important to point out that the fresh index had grown by some 1,000 links in this period. As a wild theory, I considered whether Majestic choose not to include blocks of links if they don’t receive data in that particular iteration, rather than attempting additional recrawl cycles between updates. In any case, I’m sure these links would be re-included in another iteration (which are extremely regular!).
Which Tool is The Best?
This after all, was the point of the article. Frankly, it’s down to Open Site Explorer, the Majestic Index and Google Webmaster Tools for pure play link data collation and analysis. If, like me you most frequently opt to analyse the data outside of the UIs provided by the vendors then your choice is simple. You need to work with data from all 3 as a bare minimum. The bottom line is this – I would consolidate and de-dupe the data from all 3 to get the most complete picture of my backlinks. While Majestic and Google Webmaster Tools have the “most” data, they’re not complete. Open Site Explorer is about on par with Majestic Fresh (it’s slightly better in this case) but the data is different.
They all have differing crawl depths, index compositions, user interfaces and additional features. I can’t see how any serious link analysis would omit data from any of these sources. On a product development note, I really wish Google Webmaster Tools would make their link data available via the API.
On a final note, since writing this post it’s important to point out that SEOmoz have rolled out another update (here’s the announcement) – I’d also like to give a special shout to Branko Rihtman, who carried out a fascinating in-depth analysis across MANY domains to come to his answers. Branko, I salute you!
Image credit: The Art Nerd