Since the very beginning of our craft, Search Engine Optimisers have obsessed over the subtle complexities of the numerous ranking factors determining their precious organic rankings. Since Google’s Panda update, and the subsequent “dozen or so” changes to the algorithm, perhaps the SEO world has changed forever.
Panda: Not an easy update to understand
When the Panda update came along, the typical torrent of analysis and conjecture followed predictably – but still weeks after the update, few SEO’s could pinpoint exactly what had changed, and what strategy to deploy to remedy the symptoms of a demoted site. That struggle continues. Put simply, Panda is the product of machine learning applied to the problem of what characteristics do and do not associate with “quality” results for Google users.
On Friday, Google released more guidance on building high quality sites, acknowledging that their recent update(s) were focused on a range of factors designed to focus on user experience, and that webmasters should consider those factors to improve, repair or preserve their rankings.
Google justify their refusal to be entirely transparent about the specifics of their ranking algorithm: “because we don’t want folks to game our search results”. In his post, Information Retrieval specialist Amit Singhal (read his publications here) offers the web master a solution; “if you want to step into Google’s mindset, the questions below provide some guidance on how we’ve been looking at the issue”.
That “mindset” might be better considered insight into how Google’s machine learning has been trained.
Let’s take a look at the list and indulge in a little guesswork. How might an SEO translate Amit’s comments in that post?
Q: Would you trust the information presented in this article?
SEO translation: “Trust” could be measured by the links (citations) awarded to the article, the more authoritative the link, the more trust worthy the article. Of course, the article itself could have an impact on it’s own trustworthiness by how carefully chosen the resources it cites as an authorative source may be.
Linking to low quality sites, penalised sites or downright awful places on the internet is hardly likely to inspire trust, is it? If links alone answer this question, there’s nothing new in what I just said. We continue to attract natural, editorially awarded links to our websites by crafting great content.
If significant volumes of links aren’t available just yet, perhaps a shorter term solution could be to analyse the social buzz associated with the article. Was there a big discussion surrounding the URL on Twitter? How many people shared the URL via Facebook? How authoritative were the users sharing these articles? “Is this the sort of page you’d want to bookmark, share with a friend, or recommend?”, a question that appears in our list, feels quite relevant here.
I found the “For a health related query, would you trust information from this site?” question interesting. Advice from very poor quality health websites could be life threatening – perhaps a nod to the importance of citation from trustworthy medical journals and publications?
Further reading: Domain trust and site authority is explained well in Rand’s “Whiteboard Friday – Domain Trust and Authority“. Matt Cutts, in his PageRank Sculpting post from June 2009 states “In the same way that Google trusts sites less when they link to spammy sites or bad neighborhoods, parts of our system encourage links to good sites.” SEOoptimise later encourage linking out to quality sites using deep links in their March 2011 article, “How to link out for SEO benefit”. Finally, the arguments made in “Facebook + Twitter’s Influence on Google’s Search Rankings” are highly worthy of consideration, SEOmoz, April 2011.
Q: Is this article written by an expert or enthusiast who knows the topic well, or is it more shallow in nature?
SEO translation: “Shallow content” might refer to a document that touches on a subject, but misses key ideas associated with the topic. Perhaps the work is very short, “thin” or fails to cite expert, authority sources elsewhere on the web.
Google also asks: “Are the articles short, unsubstantial, or otherwise lacking in helpful specifics?”. Here’s an example of exactly that – the sad irony being the article attracted enough attention from our industry as an example of exceptionally thin, at scale generated content that it now ranks well for “how to install Photoshop”.
Let’s not forget the key people behind this update are Information Retrieval specialists. There’s no doubt in my mind that the complexity of Google’s topic modelling has been evolving to understand, specifically, the nature of “quality” for the benefit of people using search engines. I think that’s a good thing, too. It’s not just about detecting pages with no unique content, or “thin” content – it’s about algorithmically detecting how deep an article goes in covering a subject. Whether the algorithm is truly successful at that objective or not, you have to admire the sentiment, and respect the level of engineering search engine mechanics are using to achieve the outcome. “Does this article provide a complete or comprehensive description of the topic?” and “Does this article contain insightful analysis or interesting information that is beyond obvious?”, are questions listed in the post that also feel relevant here.
The question: “Is the article written by an expert?” is a very interesting concept. You can search by author in Google Groups, Scholar and Google News, but could author be a (meta) data point in Google’s Caffeine infrastructure? Can the author name be associated with a social network that regularly discuss this topic, and share URLs that relate to the idea?
A later question in the Google article; “Is the content mass-produced by or outsourced to a large number of creators, or spread across a large network of sites, so that individual pages or sites don’t get as much attention or care?” might be related – in that article authors may not be considered to have expert status in a field. Large volumes of content that receive little “attention or care” would be unlikely to earn any authority links and would likely become “redundant” (see next section).
Further reading: Read everything Dave Harry has researched and written on the topic of semantic search / topic modelling and you’re easily immersed in a fascinating subject. Read everything found on SEO by the Sea. When you’re done, try; Understanding Semantic Search and SEO, Search Engine Journal, May 2010 and Google Rankings and LDA, Search News Central, July 2010. For a simple, but practical demonstration of LDA (Latent Dirichlet Allocation) in action, check out SEOmoz’s LDA tool, or Virante’s LDA Content Optimiser.
Q: Does the site have duplicate, overlapping, or redundant articles on the same or similar topics with slightly different keyword variations?
SEO translation: Article sites that host spun content, beware. Look out for excessive internal duplication and get your SEO strategy just right for heavily duplicated, thin index pages. “Overlapping” might imply that large quantities of articles that don’t really develop an idea (just essentially saying the same thing over and over). This sounds like an attempt to identify sites that host spun content.
Take these, genuine spun article examples – all taken from the same website:
“Learn Primary Spanish Quickly With These four Steps”
“Teaching Tools for Learning Conversational Spanish”
“Learn Spanish – The Quick and Enjoyable Manner – It’s Easy With These four Ideas”
“Learning Spanish? Here’s The right way to Make It Easier and More Enjoyable!”
The thing is, you can tell a low quality, spun article a mile off. Obviously, articles like this add no value, and speaking on behalf of users, they’d never be missed. For search engines, there are usually multiple indexed versions of the same article, but they won’t be perfectly varied. Search engines should easily be able to spot the “markers” of a spun article, perhaps detecting sentences that are the same over many different iterations of the document across multiple domains. Easy to detect (you’d think), and surely easy to penalise. “400% unique” doesn’t mean unique.
“Redundant” could mean large quantities of articles that are buried very deeply in your site, but have acquired no external links whatsoever. No one wants to link to rubbish, and now that article is buried 100 or so paging links deep on a site, no one will ever, ever find it. If your domain had a large amount of this type of content (a far higher ratio of old pages with no links compared to pages with links), you were definitely at risk from Panda (whether you were actually penalised is a different matter).
Further reading: For a primer on what, exactly content spinning is, check out Rishi’s article on the topic. For a good break down of the types of duplicate content (and strategies to make internal pages more unique) check out Dr Pete’s “Fat Pandas and thin Content”. The warning came in early, though – check out “Googlers to Webmasters: Remove Your Thin Content“. Re-read advice on the May Day update, and work to the mantra (like we do) that if a page generates no traffic from organic search, it may be a candidate for removal.
Q: Would you be comfortable giving your credit card information to this site?
SEO translation: Is this site secure and has it been verified by reputable credit card transaction providers, is it PCI compliant?
Guessing how this could factor be detected (if at all) is a big of a leap of faith for any SEO – the tangible signals from a website that it’s safe could be a link back to the verification page offered by companies like Verisign or Truste, or a link from those sites back to the merchant. Retail sites display badges for all of their security accreditation and are linking to the appropriate location. It’s a little like a Google Adwords certified agency – you’re issued with a page at Google that verifies you are an authorised consultant. Perhaps there are noted, high trust signals online that will almost guarantee a site is secure?
Further reading: There’s no direct resource to link actual SEO and retailer reputation, but reputation and credibility is incredibly important for conversions. “Retailer Reputation: Showing Off Your Street Cred” over at Getelastic outlines some of the general strategies retailers could employ, but I think it would be important test the conversion rate impact on your own site by including these logos. Trust is a key element in conversion, and seemingly in Google’s opinion on quality, too.
Q: Does this article have spelling, stylistic, or factual errors?
SEO translation: Bad grammar, spelling mistakes and formatting errors just don’t look great to users and are probably very easy to detect for search engines. The occasional typo is unavoidable from time to time but an excess of poor spelling and grammar is, plain embarrassing.
The question, “Was the article edited well, or does it appear sloppy or hastily produced?” might also refer to spelling, stylistic and factual errors, as might “How much quality control is done on content?” covered later in this post.
Further reading: Spell checking on websites has been on the “quality issues” list since, forever. Matt Cutts was pleading with web masters to spell check their sites in 2006. Much more recently, Google can determine reader level of a document, and even translate poetry with their AI capabilities.
If you’re running on ASP .net, get your development team to install this extension for Visual Studio 2010. Otherwise you could hack together a quick tool from Google’s spell check API, or ask my friend over at SEOmoz QA for a helping hand.
Q: Are the topics driven by genuine interests of readers of the site, or does the site generate content by attempting to guess what might rank well in search engines?
SEO translation: Think about it. On a real blog, you’re never going to see highly competitive keywords in every post title. SEOgadget talks about a bunch of things my team and I are interested in – we’re not writing for competitive keywords in the articles themselves. Sometimes a post flops, sometimes it lives for an unexpectedly long period.
Think back to the learn Spanish example given above. If each post in succession happens to include high value / high volume search terms (or terms with high bid / impressions in adwords), then there’s surely something wrong. It just doesn’t look natural. I don’t want to read endless variations of “how to” + learn + French based articles, who would?
The other part of the question posed by Amit was the “generate content by attempting to guess what might rank well”. I’m only taking a wild guess here, but data sources such as the suggest API and wonder wheel can make a really easily scraped source of new article idea data. In fact I’m aware of organisations that generate content by executing a suggest API scrape, fetch adwords api search volume and use this data on a priority basis for their content strategy.
Which came first, insight and quality or traffic and rankings?
Further reading: Rand’s “A Recommendation for Google’s Webspam Team” touched on the concept of starting with high value adwords terms to detect likely candidates for search engine link spam. I agree with Rand that it’s possible this data already influences how the web spam team might work, and it might not take much of a leap of the imagination to consider how search data could be fed back into an algorithm to assess the likely quality of an article.
Q: Does the article provide original content or information, original reporting, original research, or original analysis?
SEO translation: What’s the point in writing an article if it doesn’t add something new to the discussion? This is very true for news based content – if you’re regurgitating what 100+ other publishers have just said, good luck getting it to rank. The concept translates easily into organic search – it’s good to develop a topic, add a thoughtful, unique or different angle to an idea, but to simply replicate it is poor. “Original reporting” could be translated in to “first discovered” – have you broken a topic or developed it significantly at a very early stage of development?
The footprint left by “Original research” should be very easy to predict. If you’ve published an item of research or some original data, it’s bound to attract a series of citations from authors who go on to develop, comment, agree, disagree, review and so on. Look how a UK tech news site reports on a piece of research on the Panda update.
Further reading: Creating great content is something you do all the time, or it isn’t. If it’s not – you need to learn. Understand the needs of your target audience and develop topics around their interests. If you’re making your website an indispensable resource optimised for tonnes of great brand recognition and repeat visits then you’ll (hopefully) be able to get over the crack cocaine that is commercially skewed nonsense and start to add value. There are heaps of resources dedicated to great writing, copyblogger is a fantastic example.
Q: Does the page provide substantial value when compared to other pages in search results?
SEO translation: Does your landing page meet the likely expectations of search engine users? Google is listening to user feedback, and incorporating that data into their ranking results. Even with no toolbar feedback, Google could assess nominal user behaviour patterns for a query by looking at bounce rate and return velocity for all of the results – if a high ranking result stands out with poorer than average metrics, it might be a candidate for dismissal.
Using heavily optimised anchor text can influence web pages to rank for queries that might not be a perfect best fit for that query. Cannibalised keyword strategy in inbound anchor text could be a problem for the future, if a ranking encourages a bounce or a user complaint (block): “Would users complain when they see pages from this site?”.
Further reading: “Hide sites to find more of what you want“, Google March 2011 and “High-quality sites algorithm goes global, incorporates user feedback“, Google April 2011. “Microsoft’s Approach to Identifying Quality Search Results Based on User Feedback“, SEO By The Sea, May 2011.
Q: How much quality control is done on content?
SEO translation: Beyond the initial, obvious quality factors such as spelling and grammar, there’s also the question of detail and substance; “Are the pages produced with great care and attention to detail vs. less attention to detail?”.
Then I considered if keeping articles accurate and up to date might be an interesting factor in quality control. There might be temporal factors influencing “quality” over time. Certain types of article, over time, might become less accurate. A really good example of this is my guide on installing Ubuntu. Gradually, the instructions have become closer to obsolete. Despite a few major rewrites over the years, it’s definitely time to update it again.
It may not be enough simply to have an archive of “great” content. Ideas grow old, best practices expire. Perhaps great SEO strategy should incorporate rewriting and updating article and guide content.
Further reading: “Headsmacking Tip #11 – Refresh Legacy Content for a Rankings Boost” – SEOmoz, January 2009.
Q: Does the article describe both sides of a story?
SEO translation: Is this page overly commercially skewed, or does it provide a balanced outline of a subject? For example, a website promoting products may be producing overly skewed articles encouraging users simply to buy, rather than consider alternatives. The content may be entirely positive – where reviews and sentiments gathered elsewhere on the web may disagree.
How a search engine might recognise balance in an article, its sentiment, could be in some way related to the technology used to understand sentiment in product reviews, tweets and blog content. Back in 2009, Matt Cutts confirmed that “If you sort by reviews, Google will perform sentiment analysis and highlight interesting comments”. The technology has obviously been around for some time – RankSpeed, for example, attempts to recommend the “best” products and websites based on the sentiment in which they’re described in tweets and blogs.
How far this approach is applied to the analysis of “both sides” of a story is difficult to predict, but perhaps Google understand when an article seems overly skewed in favour of a particular idea, when in fact the topic should be balanced with viewpoints from differing perspectives?
Further reading: “Google’s New Review Search Option and Sentiment Analysis“, SEO By The Sea, June 2009. Google’s “Gold Standard” Search Results Take Big Hit In New York Times Story, Search Engine Land, November 2010. “Sentiment Analysis for SEO“, Science For SEO, January 2009.
Q: Is the site a recognized authority on its topic?
SEO translation: Is a site well linked to, by authority sites in a related niche? Sites that cover too broad a range of topics may be difficult to identify as an authority source when compared to sites that focus on a very narrow range of topics, particularly in communites that require a higher degree of peer recognition (SEO being an excellent example of this).
This may not be a problem in niches that are saturated with websites covering a broad range of topics, but for subjects that have many recognised authorities on a subject, broadly focused sites may not perform well.
Valuable inbound links, the relevance to topic of the sites they originate from and keywords in inbound anchors may be a factor. Frequently appearing, relevant terms on the site might also provide a key insight into the level of focus on the topic identified.
Q: Would you recognize this site as an authoritative source when mentioned by name?
SEO Translation: “Brand” search volume was suspected as an influential factor related to the Vince update, or at least a factor used to contribute to a brand’s “authority”. Central to recognition is pure search volume – if there are many searches for a company name and domain name, it’s fair to assume that the company in question have a strong brand. An established brand name is a signal of trust, and has been (I suspect) for a long time.
“Google’s Vince Update – Brand or No Brand?“, SEOgadget, July 2009.
Q: Does this article have an excessive amount of ads that distract from or interfere with the main content?
SEO Translation: Google have a mechanism to evaluate the balance of ads versus original content on web pages. Tom was on this almost straight away. His custom search engine will show you results for any website “hit” by the Panda update. Choose a web page at random (like this one) and test it in browser size:
Oh dear, it’s all ads. Check out this help article from the Google Adsense Team: “Best practices for laying out your site and your ads” (Via SEO Theory). Their advice?
“Show off your content: While placing ads above the fold is a good way to improve ad performance, also make sure that users can easily find the content they are looking for. For example, if your site offers downloads, make sure the download links are above the fold and easy to find.”
Further reading: “Best practices for laying out your site and your ads“, Google. “Deconstructing Google“, Gianluca Fiorelli, March 2011, “What the Google Panda Update has Taught the SEO Industry“, SEO Theory, May 2011, “TED 2011: The ‘Panda’ That Hates Farms: A Q&A With Google’s Top Search Engineers“, Wired, February 2011 and last but by no means least, “Google’s Panda Update – What to do About It“, Distilled, February 2011.
Ranking factors increase in complexity
We can analyse until the cows come home – in fact, I spent much longer than I probably should have writing this post. This update was the most impressive in Google’s update history – it kept the SEO industry guessing for a long, long time. It turns out the answer is that it’s important to understand the many variables of quality. If you’re prepared to come to terms with the fact that it’s not just a technical thing any more, it becomes a lot easier to understand the direction we, as SEOs all now headed in. In the future, it seems SEO has the opportunity to evolve in to something a lot more like product consulting.
In the end, building great sites that people love is a most rewarding experience. In a way I’m glad Google are taking us in this direction, but they, like us, still have a long way to go with this update.