Technical

Using the X-Robots-Tag in server headers on WordPress

by on 2nd December 2009

Today we’ve been spending some time thinking about and implementing the X-Robots-Tag, a lesser known Robots Exclusion Protocol for “noarchive”, “noindex”, “nofollow”, and “nosnippet” supported by Google, Yahoo and Bing. Why lesser known? The X-Robots-Tag likes to hide in your server header responses rather than in the <head> element of a web page. Rather handy in some cases, and I’m about to tell you why.

It seemed pretty relevant to write a blog post about the subject of REP, since Danny at Search Engine Land took a look at ACAP vs Robots.txt very recently – an article that in my mind, provides an excellent framework for understanding exactly what’s included in Robots Exclusion Protocol while providing an insight into how some news content providers feel about their content being handled by search engines.

The X-Robots-Tag

I’ve been interested in REP for a different reason to our pals in the news industry. An indexed text file with a cache link displayed in Google’s SERPS for SEOgadget.co.uk. Small fry compared to the mighty Murdoch Google block party story, but definitely more interesting. Particulary timely, as I’ve been looking at how removing your Google cache affects your search results.

serps with my cached link

Only web pages have a head

It sounds obvious but you need to be able to add <meta name=”robots” content=”noarchive”> in the <head> of your webpage to remove that cached link – but the problem with our indexed txt file is there is no <head>, it’s just a text file. This is where the X-Robots-Tag comes in.

Everything has a server header response

Here’s a little snapshot of live HTTP Headers, one of my favourite Firefox SEO Tools, with the X-Robots response in action:

live http headers in Firefox

You can chain different directives just like you would in your meta header, so I thought I’d throw noodp in for the demonstration, too.

Remove the noarchive from your head and try it in .htaccess

To implement a X-Robots-Tag in your server headers in WordPress, you just need to FTP to wherever your WordPress installation lives and look for the .htaccess file inside your /public_html/ directory:

htaccess file wordpress

If you’re not terribly used to working with files like .htaccess, I recommend you read this guide to editing htaccess from Windows. To get the noodp, noarchive response you see in my live http headers screenshot, you’ll need to add code like this:


Upload the new version of your .htaccess to your public_html folder and recheck your server header response. Obviously I removed the noarchive in the header.php template in WordPress, because that’s all part of my test.

Other uses may include issuing a noindex on affiliate / tracked exit click URLs that redirect via a 302. I’m testing that one currently – bottom line is the usual solution of excluding via robots.txt and nofollowing all links to those URLs doesn’t necessarily keep them out of the index. Would an X-Robots-Tag set to noindex? Handy stuff to know in my opinion.

Some resources

Other than the links I’ve already put in the post, I found numerous resources which all make for interesting reading around the subject of REP with the X-Robots-Tag and .htaccess.

X-Robots Tag NoArchive Examples

Useful examples of X-Robots exclusion in action – handy code snippet for filetype and directory specific exclusion rules

How to remove feeds in search results

Classic post from Joost before Google bought up feedburner with instructions on how to add noarchive to an XML feed. Also check out this post for a fuller exploration of X-Robots REP with more code examples.

Get insights straight to your inbox

Stay one step ahead of the competition with our monthly Inner Circle email full of resources, industry developments and opinions from around the web.