How to use wildcards in Robots.txt

by on 30th September 2008

The best way to handle duplicate content issues on a site is to do your best to manage any causes of duplication at the source. Try not to use indexable query parameters; manage your page bloat and look after regular technical SEO healthchecks to manage any surprise behaviours.

But what if you don’t have the time, resource or inclination to rework a core component of your site?

Here’s another, viable solution: Use REP (Robots Exclusion Protocol).

Here’s how to Use Wildcards in Robots.txt

This post explains how to use Wildcards in robots.txt. With the directive, you could clear out all your unwanted query strings, or at least, prevent them from being indexed in the first place.

  1. Block access to every URL that contains a question mark “?”

  2. The $ character is used for “end of URL” matches. This example blocks GoogleBot crawling URLs that end with “.php”

  3. Stop any crawler from crawling search parameter pages

Use with Caution and Test with Search Console

Some SEOs don’t agree with the use of wildcards, or at least they see it as slightly risky or far from best practise. I totally agree, they’re not ideal. You have to be very careful and give a huge amount of attention to testing in Search Console.


  1. One thing you need to keep in mind is that disallows in the robots.txt will just disallow crawling, it will have less of an impact on actual indexing. If we have reason to believe that a URL which is disallowed from crawling is relevant, we may include it in our search results with whatever information we may have (if we’ve never crawled it, we may just include the URL — if we’ve crawled it in the past, we may include that information).

    To prevent URLs like these from being indexed, I would recommend that you have the server 301 redirect to the appropriate canonical (and of course not link to the incorrect one).


  2. agree there… 301 redirect, scoop any relevant paras into a cookie, new site map . lovely. if only only i could do actually get that implemented on the site i'm working on

  3. Awesome – thanks for the info. This would be really useful for me – to have crawler bot disabled for certain wordpress posts with a particular keyword in the URL… (just amend the slug accordingly) But this only applies to Google yeah?

  4. Thanks for the tip. I was looking for something like this to block item id.
    I am working on an e-commerce website and client needs to blog such dynamic urls. So glad to find your blog.


  5. i tried to use google robot testing, but could not test it. showing some error, dont know how to fix it. further wildcards in robotx.txt are allowed by google robots, but not necessarily by other search engines.

  6. I tried to disallow feed pages with disallow /feed not working
    what is is the correct way can i use $ sign to ending like
    disallow /feed$ or disallow /feed/$ which is correct

  7. We’re trying to prevent search results pages from being indexed by Google. All search result pages start with /?s=. We’ve added the following lines to our robots.txt file, but keep getting indexed in Google:

    User-agent: *
    Disallow: /*?*
    Disallow: /*?

    Any ideas?

  8. How to prevent indexing of subsequent pages? Like, etc. I have many categories.

  9. Hi Guys,

    What would you guys do about 100,000 URLs such as,,/

    My client has them all 302’d which is disastrous in my opinion – and the robots.txt is not being honoured. I added a rule for the /product_compare/ directory.

    Would it not be better to use a wildcard (/*product_compare) to stop Google from crawling hundreds of thousands of duplicate and thin pages, rather than force the crawlers to follow a redirect?

    I am looking to restrict these pages, as more will be generated every time a user makes a comparison on this Magento site. I’m aware that noindexing would be better here but getting the developers to do this will be easier said than done.

    Would be interested to hear your thoughts on this one. I was under the impression that streamlining crawl budget was hugely beneficial – and preferable to endless redirects?

  10. Will

    User-agent: *
    Disallow: /*&

    Block access to every URLs that contains an and symbol “&”?

    Here is a standard product page on our site:

    which is ALSO accessible from several query strings, all of which are preceded by an & symbol, i.e.

    Looking to disallow any url with an & but to ALLOW urls with only a ?

    Thanks in advance

Comments are closed.

Get insights straight to your inbox

Stay one step ahead of the competition with our monthly Inner Circle email full of resources, industry developments and opinions from around the web.