Guest Blogging for Links? Choose a Heavily Scraped Site!

Today, I’d like to share an observation I made after analysing new back links acquired from guest blogging on Search Engine Journal and getting promoted to the main blog at SEOmoz. It’s really interesting how the more popular, high authority domains get copied (scraped) so frequently by other sites that have pagerank or are sometimes even functioning companies in their own right.

Could these scraper sites pass any value through their outbound links and as a consequence, can the process of guest blogging on well scraped sites be levered to work positively for your SEO?

Blogs get scraped

Ever since the introduction of WordPress plugins such as Wp-O-Matic scrapers have become a fact of life. Blogs get scraped, particularly, the larger, more successful and regularly updated sites. Take this popular post on Search Engine Journal for example – there are 78 instances of the <title> and first line of the opening paragraph according to Google’s index. If you take a look at any blog in the Adage Power 150 or Technorati’s Most Popular you’ll be sure to find their posts duplicated hundreds if not thousands of times elsewhere.

Scraping is with us and it’s here to stay, but can that fact be used to add short to medium term value to our SEO campaign? Back in April 2008, I wrote a Youmoz post about my good friend Dan Faircloth. Dan’s an engineer at the Rutherford Appleton Laboratory and specialising in particle accelerators, not SEO, he hadn’t attracted many links to his (rather new) domain at the time. After the post got published, the links back to his site increased quite significantly. They were all links from sites scraping the original Youmoz article. The best part was all of the scraped links were using  our targeted anchor text. Soon after, Dan was ranking in 1st place for his own name, which was exactly what we had intended.

Do links in duplicated content pages still pass value?

In my opinion, yes they do. There doesn’t even seem to be a limit to the number of times you can duplicate a page across unique domains to pass link value. You’d expect (or hope) that pages triggering the duplicate content filter at Google would have the value of their outbound links nullified, but I don’t see this happening in many cases. It’s not up to me to out specific examples of this, we’ve all seen it happening. If you haven’t, I’d suggest finding a high competition market and analyse the backlinks to a few domains. If you start seeing links from sites like articleblast.com, goarticles.com and articlesbase.com just do an exact match query in Google for some of the text you find and you’ll find your duplicate articles and inbound links.

Case study: Scraped post at SEOmoz

I decided to take a look at my post (titled “SEOmoz Tools – Top Pages on Domain Kick Ass”) published on SEOmoz a few weeks back. At the base of the article, there is a link back to my site using the anchor “SEO Consultant in London“. It’s not a particularly competitive phrase (nor is there much traffic) but, nonetheless, it’s a valid term and one for which SEOgadget ranked third for until a week or so ago. The article was scraped by at least 21 other domains, the data on which I gathered by using an “intitle” query on exact match for the post title and a randomly chosen sentence from the content, also on exact match.

How do you find scraped content?

My favourite way is just to use a search engine. In this example, I have used an “intitle” operator and a section of text that could only have appeared in the article in question.

google results

You could use Copyscape to do the same thing, though I have found the results to be less useful and not as fresh as the main search engines. You’ll end up going to Google in the long run. Whether you’re familiar with anti plagiarism tools online or not, it’s worth checking your own site. You might be (unpleasantly) suprised.

Data captured

To answer my question: “Could scraper sites pass any value?” I needed to collect some data. For each of the scraped articles, I collected the following information:

- URL and Domain Pagerank - SEOmoz Domain MozRank and Domain MozTrust - Comments on the article (How the original has been scraped and played back to the user on the new page) - The search  engine used to find the article (Yahoo or Google)

You can download my raw data from this URL. (Office 2007 Excel).

Common forms of scraping

The most typical form of scraping was to directly copy the original post HTML and present the content back to the audience of the scraper site. In many cases, the original links to SEOmoz.org had been removed and replaced with the host domain. One site had taken a copy of the page and nofollowed all of the external article links. Frequently, the scrapers were citing a Google feed proxy URL as the “original” source of the content. The remaining pages were displaying only the first paragraph of the page content and linking back to the original with either a do followed or no followed link.

Though all forms of scraping are quite annoying if you’re a site owner, the worst instances (IMO) are when the original links in the article are replaced with internal links elsewhere on the scraped site. No value whatsoever is passed back to the original author, nor the sources the original author cited as valuable. I did find that specific domains were being removed rather than all external links – i.e “seomoz.org” was replaced where “seogadget.com” was not.

Google Pagerank

Though none of the urls had yet been awarded pagerank, out of the 21 scraping sites found, 17 of the domains had a Google pagerank between PR6 and PR1:

Graph: Pagerank of scraper sites

SEOmoz Domain MozRank and Domain MozTrust

16 of the 21 sites found had MozRank and MozTrust – the most trusted and ranked sites being quite high (6.03 DmR and 6.24 DmT). These values are higher than SEOgadget, which has a DmR of 4.39 and a DmT of 5.28. None of the scraped page URLs were in the Linkscape index and didn’t have their own metrics available.

domain moz trust and domain moz rank of sites

Conclusion

Most of the site domains included in the sample data have Pagerank, MozRank and MozTrust. Some of them are in fact perfectly “authoritative” sites in the eyes of search engines such as Google and backlink value analysers such as Linkscape, which would imply they are capable of passing link value. I’m not saying scraping is good, but I am making a comment on their ability to pass value. There are a number of different methods of scraping and problems can be introduced during the scrape process such as bad HTML parsing, linking to RSS feeds and linking out to 404 error pages. That said, for the most part, links back to sources referenced in the posts tend to be left untouched, which (during this test) included the footer text left in the base of my articles. Authoritative domains pass value as search engines index new pages on those domains. Taking that fact into account, it is fair to assume that the scraped sites identified in this test will pass value via the outbound links in the scraped content. I’m still watching a few pages which have links from recently published, scraped posts to test this conclusion further.

Recommendations

My recommendation to anyone thinking of posting on a 3rd party blog is, given the likelihood of the target site being heavily scraped, think very carefully about your content’s outbound links, especially in the footer of the article. Use a sign off, referencing your site and the most important pages on your own blog. In my case, I use a footer link like this:

Richard Baxter is an SEO Consultant in the UK and chief blogger at SEOgadget.co.uk. Come check out our latest SEO Jobs or, if you’re recruiting, post a job free.

Finally, if you’re thinking of targeting a blog with an offer of a guest post, be sure to read Josh Klien’sHow to Guest Post to Promote Your Blog” and Darren Rowse’s advice on “How to be a Good Guest Blogger” to get yourself positioned in the right way when you’re authoring your content.



Stay Updated: Sign Up for Webinar & New Blog Alerts

26 thoughts on “Guest Blogging for Links? Choose a Heavily Scraped Site!

  1. thank you for the last recommendation

    I write a lot on blogs, and never thought to put an ‘author reference’ at the bottom.

    It’s annoying when someone scrapes your info, and removes your links!

  2. Hi Robert – it sure is! Be sure to get those footer links in, they pass more value than you might think…

  3. Ikroh says:

    Very interesting! Thank you for taking the time to document your findings. I agree with Robert & Richard very annoying when someone scrapes your info and removes the links!

  4. Zahid says:

    I remember I started a niche scrapper site and I started receiving emails from blog owners that I was taking away their readers and I was only taking first 200 characters of the paragraph. Anyways, I dumped the website because I felt bad for them… my website was PR 4 and they all had PR 2 and instead of thanking me, they complained because they were getting free traffic…

    Just my 2 cents. Thanks for the comprehensive research.

  5. MLDina says:

    If nothing else, the scraper sites provide more readers the opportunity to find your post. While they should of course provide links to the original post, it’s probably not easy to get in touch with hundreds and thousands of site owners to police their links.

    Guest posting is a great idea for getting traffic, even if your post is scraped!

  6. This is a great post. I never thought of using that to your advantage. I really love the posting of other content on the bottom. Great idea.

  7. Jon Payne says:

    Fantastic post Richard. Thanks for the detail and data. Opinions are a dime a dozen, its nice to see someone share specific examples and data here!

  8. thanks for the article Richard. Personally I think scraper sites just pollute the web, so I wouldn’t mind seeing them all just go away. Since that’s not likely, your insights help alleviate the annoyance factor of scraper sites. The “sign off” is really a great idea and makes the most sense – it’s no different than having any article appear that has a mini bio block in it, but with the added bonus of those links.

  9. Jerry Okorie says:

    It’s always a joy to read your posts. I have been capitalising on the footer links of blog posts and it really does have a positive impact.Thanks Rich

  10. I noticed a similar thing while promoting a few of our blogs on social media sites such as Digg or Reddit. The posts that managed to hit the front page of Digg got scrapped like crazy.

  11. Adrian Land says:

    Good post, enjoyed this one. I hope one day to be worth scraping!

  12. What is even worse is when someone scrapes content and changes the hyperlink. I have had this happen to me.

  13. @Nick – that’s the the worst kind. Looks like it doesn’t matter so much when it’s an exact copy, but when the hyperlinks are changed it’s a whole different kettle of fish!

  14. Iris Rounds says:

    Great information! I found the link to this article from a tweet sent to Lynn Terry on Twitter!!! (That was tongue-twister.)

  15. Dolphin says:

    Good post, enjoyed this one. I hope one day to be worth scraping!

  16. Now this is an interesting approach! While I don’t agree on dupe content links being the same value I see that (devalued) stuff adds up a lot here

    Thanks for putting this together!

    Presell Page Man

  17. Chris Page says:

    I’m just starting out so this makes an interesting and worthwhile read. Thanks.

  18. Suzanne says:

    Thanks so much – I had no idea there was a term for what I saw happening with our website. And thanks for the different look at it!

  19. Adam says:

    interesting findings. we have had problems with scraping of our content as well. thanks for the article.

  20. Sutures says:

    it's probably not easy to get in touch with hundreds and thousands of site owners to police their links.

  21. Compvision says:

    I read something about guest blogging and was doing some more research about it and found this. Very interesting post. Specially the part about the duplicate content, because I had wondered if what effect it could possibly have and this answered it perfectly. Thanks a lot for that Richard…

  22. I’m sorry that I found your blog only now. But I’ll catch up! :) On topic, we should encourage scrapers to pull our content, right? :)

  23. Mike says:

    That’s a very informative blog post, thanks for sharing that information with everyone! Its interesting to see that links can retain their value through scraping.

  24. SEOP says:

    Great tips. Thanks for sharing. This is can help me improve the traffic for my blogs.

  25. Now two years later there are only 6 posts left from the 78 original, but still same extra value. The Farmer may also have effect on activity of the scrapers. But Guest Blogging is still a good complementary way of link building

  26. Brian says:

    Guest blogging does seem to be the best, easy, yet time consuming thing to do build links and to bring traffic to your site. I mostly focus on getting guest authors for my blog and spend more time writting posts on other peoples blogs because thats a win win as long as the content is quality. Great Post!

Comments are closed.