Today we’ve been spending some time thinking about and implementing the X-Robots-Tag, a lesser known Robots Exclusion Protocol for “noarchive”, “noindex”, “nofollow”, and “nosnippet” supported by Google, Yahoo and Bing. Why lesser known? The X-Robots-Tag likes to hide in your server header responses rather than in the <head> element of a web page. Rather handy in some cases, and I’m about to tell you why.
It seemed pretty relevant to write a blog post about the subject of REP, since Danny at Search Engine Land took a look at ACAP vs Robots.txt very recently – an article that in my mind, provides an excellent framework for understanding exactly what’s included in Robots Exclusion Protocol while providing an insight into how some news content providers feel about their content being handled by search engines.
I’ve been interested in REP for a different reason to our pals in the news industry. An indexed text file with a cache link displayed in Google’s SERPS for SEOgadget.co.uk. Small fry compared to the mighty Murdoch Google block party story, but definitely more interesting. Particulary timely, as I’ve been looking at how removing your Google cache affects your search results.
Only web pages have a head
It sounds obvious but you need to be able to add <meta name=”robots” content=”noarchive”> in the <head> of your webpage to remove that cached link – but the problem with our indexed txt file is there is no <head>, it’s just a text file. This is where the X-Robots-Tag comes in.
Everything has a server header response
You can chain different directives just like you would in your meta header, so I thought I’d throw noodp in for the demonstration, too.
Remove the noarchive from your head and try it in .htaccess
To implement a X-Robots-Tag in your server headers in WordPress, you just need to FTP to wherever your WordPress installation lives and look for the .htaccess file inside your /public_html/ directory:
If you’re not terribly used to working with files like .htaccess, I recommend you read this guide to editing htaccess from Windows. To get the noodp, noarchive response you see in my live http headers screenshot, you’ll need to add code like this:
# Set HTTP x-robots-Tag response header to "noarchive" for all page requests
Header set x-robots-Tag: "noodp, noarchive"
Upload the new version of your .htaccess to your public_html folder and recheck your server header response. Obviously I removed the noarchive in the header.php template in WordPress, because that’s all part of my test.
Other uses may include issuing a noindex on affiliate / tracked exit click URLs that redirect via a 302. I’m testing that one currently – bottom line is the usual solution of excluding via robots.txt and nofollowing all links to those URLs doesn’t necessarily keep them out of the index. Would an X-Robots-Tag set to noindex? Handy stuff to know in my opinion.
Other than the links I’ve already put in the post, I found numerous resources which all make for interesting reading around the subject of REP with the X-Robots-Tag and .htaccess.
Useful examples of X-Robots exclusion in action – handy code snippet for filetype and directory specific exclusion rules
Classic post from Joost before Google bought up feedburner with instructions on how to add noarchive to an XML feed. Also check out this post for a fuller exploration of X-Robots REP with more code examples.