The best way to handle duplicate content issues on a site is to do your best to manage any causes of duplication at the source. Try not to use indexable query parameters; manage your page bloat and look after regular technical SEO healthchecks to manage any surprise behaviours.
But what if you don’t have the time, resource or inclination to rework a core component of your site?
Here’s another, viable solution: Use REP (Robots Exclusion Protocol).
Here’s how to Use Wildcards in Robots.txt
This post explains how to use Wildcards in robots.txt. With the directive, you could clear out all your unwanted query strings, or at least, prevent them from being indexed in the first place.
- Block access to every URL that contains a question mark “?”12User-agent: *Disallow: /*?
- The $ character is used for “end of URL” matches. This example blocks GoogleBot crawling URLs that end with “.php”12User-agent: GooglebotDisallow: /*.php$
- Stop any crawler from crawling search parameter pages12User-agent: *Disallow: /search?s=*
Use with Caution and Test with Search Console
Some SEOs don’t agree with the use of wildcards, or at least they see it as slightly risky or far from best practise. I totally agree, they’re not ideal. You have to be very careful and give a huge amount of attention to testing in Search Console.