More than several years ago now, some of my closer colleagues will remember SEOgadget had been subjected to a directory submission blast. Two, in fact. The domain had been submitted to around 2,000 adult website directories (sigh), and a batch submission to Directory Maximiser.
I acted on the adult directory list – in fact I found the company that owned the directories (based in Hyderabad, India) and paid the owner a small sum to remove all 2,000 links, but never really bothered about what had happened with Directory Maximiser. We migrated domains a couple of years later, and I thought no more of SEOgadget.co.uk’s plight.
Until earlier this year. When I was writing about using change of address in Webmaster Tools, I found this:
What a terrific opportunity for a case study, though. I just happened to be writing a presentation for an SES Power Tools session, so I thought I’d talk about my experience getting this sorted.
There’s an obvious caveat here: you’d want to avoid a penalty entirely by getting a disavow file submitted to Webmaster tools ahead of time, though it’s still good fun to get my sleeves rolled up and get a manual action revoked (even though I have several teams that could have done this for me).
OK, I want to create a disavow quickly and effectively. What tools should I be using, and what data points should I care about?
Firstly, the data tools: for me it’s MajesticSEO, aHrefs and Webmaster tools for the raw link data export:
Majestic’s Fresh index has a reliable re-crawl rate (of around 90 days). Trust and Citation Flow are really interesting metrics – any large difference between the two can be a positive indicator that something isn’t right. The Historic index is *massive* but quite difficult to work with and very noisy. I asked around and most of us agreed that unless their working with the API we rarely touch the historic index these days.
AHrefs is the rising star in our organisation. They seem to have a pretty solid UI and a fast discovery rate for new links. This makes initial assessments really easy, particularly when we’re looking at charts that may indicate a domain has been attacked or has ramped up their link acquisition efforts.
Finally, if I only had one tool, it would have to be Google’s Webmaster Tools. In 2012 I took a look at who had the deepest and most diverse crawl (and consequently, who had the best link data). It was WMT. It is still WMT.
We took a look at SEOprofiler (by being shoehorned via their free tool: openlinkprofiler.org) – I must say the constant attempts to force a sign up were not well received. That’s forgiveable, but the depth of the data they provide is not. There are some clever features (such as filtering by linking page title and link context) but the depth of the tool leaves it little more than a trivial distraction.
Opensiteexplorer.org gets an honourable mention but for link auditing, it’s a no from me. The type of “warts and all” data you need for this type of analysis isn’t provided, though for reporting on good quality links, and metrics like Domain Authority, OSE is very obviously still a part of our day to day.
Then we have consolidation tools, tools that don’t generate link index data per se, but aid in the collection, consolidation, analysis or action stages of an audit:
LinkRisk is a really remarkable new entrant to the space. It’s the greatest time saver (especially if you’re an Agency SEO with multiple small to medium size site profiles to manage). You can segment links by anchor text, PageRank, risk, status, site wides, etc). It connects to all of the major services to extract link data: Webmaster Tools, Majestic and aHrefs. Awesome. With a bit of perseverance you can build a decent disavow file and get the outreach process started for link removal. We’ve built our own stuff for larger projects but I must say – if you’re in-house and don’t have a huge amount of resource to reinvent the wheel, don’t!
Our own SEOgadget for Excel connects to aHrefs, Majestic, Moz, Grepwords and SEMrush to speed up the process of fetching data. In this use case I recommend you giving the =GetIndexItemInfo() command from the MajesticSEO API a whirl – that’s the API call to get all the Majestic Metrics into a spreadsheet, including Citation Flow and Trust Flow.
Scrapebox can be used for some genuinely practical applications (it’s associated with things like comment spam, but it is in fact an incredibly powerful scraper!). We use it for checking for fake PageRank and whether inbound linking domains have been removed from Google’s index, but if you want a good tour of basically everything it can be used for, take a look at Jacob’s guide.
The fake PageRank checker in Scrapebox is super powerful (and adds a lot of other useful data points into the mix like IP address, cached status, etc).
What data points are we looking to collect?
Completeness is everything. We build as diverse data set as possible, mostly because there is *always* some debate with a client on the greyer areas of their link-building – we need to take a hard and fast approach to this so having a decent case as to why it needs to go is a good thing:
- Ahrefs, Majestic, WMT - Destination URL & HTTP status - Anchor text - PageRank (domain / URL / Fake check) - IP address - Follow/Nofollow - No. referring domains - Live / not live - Page indexation - Pages with malicious / explicit content - Page title & meta description
What are you looking out for?
These are the sorts of things you’re looking out for; it’s not an ehaustive list but there are some pretty big tell tale signs in here:
- Site wide links - Links from penalised domains - Odd PageRank distribution curve (eg: lots of PR0/1′s) - Links with exact match keyword anchors - Links from pages with malicious / explicit content - Links from directories / article directories - Lots of domains on the same C-block IP - Anchor text distribution – brand vs too much non-brand - Hidden do follow comment spam (for example under disqus plugin) - Links from known software submitters like GSA
One of the data points I really find interesting on suspicious inbound links is when the link or the linking domain has been removed from Google’s index, but it still has a PageRank value assigned:
None of these data points alone are 100% reliable but if I’m going to draw a hard and fast (and reasonably aggressive) view of what’s got to be disavowed, domains that are no longer indexed is a pretty big red flag to me.
Obviously site wides are easy to spot (and identify in aHrefs, WMT and Link Risk) and with LinkRisk, anchor text footprints from things like directory submission are really easy to grab hold of:
Side note: the de-dupe test
Which tool offers the most diverse link data? By diverse I mean – the most number of pages crawled per domain and the most unique domains. Back in 2012, it was Webmaster Tools. It still is!
I download everything a tool’s got to give (individual URLs linking to my site). Then, I extract the unique domain names and de-duplicate the results. That gives me a pretty good feel for the diversity of the data contained in each service.
Consolidating all exported data gave us 5672 unique domains linking to SEOgadget.co.uk. That means it’s a trivial matter to understand how much of a contribution each tool would make to your process. The “coverage” score looks at how many of the domains were found in each export – so most of it comes from WMT.
From that point I created a manually reviewed disavow list, firstly as if I was only working with one tool, and secondly as a consolidated “master” list:
I still feel like it’s worth having Majestic and aHrefs – they seem to be able to find links that Google does not report, although the volume and diversity of unique domains coming from Google vastly outperforms the others.
A word on disavow (and what to remove)
Don’t take any stage in this process lightly.
When Cyrus submitted all of his links via disavow, he permanently lost all of his rankings. Review your links thoroughly – submit a *good* link and it’ll never give you PageRank again.
Should links be removed or disavowed, or both?
It depends on the severity of what you were doing. If it’s all directory links and simple, stupid stuff that isn’t terribly extreme, a disavow will do. If it’s paid links, if you’re on an outed network, if it’s something you could make a webmaster analyst look stupid for if you blogged about it after re-inclusion, it needs removing. And on taht note, the worst thing you could ever, ever do is *say* you’ve stopped doing something when you very obviously have not. Google can see what you’re up to!
Anyway, the conclusion to my story
Here’s my re inclusion request after uploading my new disavow file (with 2,378 domains!) . Note, I haven’t written War and Peace, I haven’t outed anyone (honestly, I’ve seen some reclusion requests that are several pages long, out every agency the brand has ever worked with – not necessary guys), I’ve just been nice, honest and to the point:
And then this happened, just in time for my presentation at SES (*wipes brow*):
The moral of the story?
Link auditing needs hard and fast rules to work. There are flags that you should simply not ignore, at least, if you do, ignore them at your peril. If you have a penalty, you’ve got to be aggressive. Obviously the situation described was quite mild but it still took a very large disavow file, lots of data integrity and detailed manual review time to get it right. If the back link profile of a site isn’t too dirty, then a complete disavow of anything vaguely awful will be fine. You’ve got to be very, very thorough, though.
As an aside: I’m not sure why SEO’s wait for a penalty before starting the clean up process, if you’ve got lame links in your profile, disavow them now!
If you’ve been very aggressive in your link tactics, bought network links, offered product in exchange for exact match anchors, been on some of the very high profile paid networks, especially the really bad ones, then they’re going to insist on you cleaning up after yourself.
And finally, on what makes for a bad link – if you find yourself debating whether a link is good or bad, it’s definitely bad. Here’s the presentation for later!
Image credit: John