Today, I want to share some tips for quickly weeding out errors in sitemap XML files. This post will teach you a few new tricks with the search industry’s favourite crawler, Screamingfrog, a tool that makes for a fantastic XML sitemap checker. I hope this tutorial saves you some time in your own SEO audits.
By the end of this tutorial, your XML sitemap will the audited for any 404 errors (or 5xx errors, etc).
Why does a website need an error-free sitemap.xml file?
The advice we always hear from Google is: keep your sitemaps validated and as error-free, as you can. The sitemap file is used to declare the preferred canonical URL.
As a signal, an error in a sitemap file is a pretty big deal and best to be avoided.
I’ve always felt that a sitemap file with a very low load time is also advisable if you can speed up the dynamic elements of the file generation. I’ve come across very large sitemaps that are clearly regenerated on every request without any caching. Check this carefully.
The previous version of Google’s Search Console still offers the better functionality, compared to the new version which is currently extremely limited:
I’ll update this article should the new functionality develop, which I’m quite sure it will.
Configure Screamingfrog to only crawl the URLs found in the XML sitemap
Screaming Frog is a very curious little site crawler. The default configuration, even in list mode will crawl a lot more resource than perhaps you might want. Here’s how to configure the crawler:
Head to Configure > Spider and deselect everything from the “Basic” tab. This will prevent Frog from discovering new URLs or unnecessarily crawling resources that aren’t useful for this project.
One the configuration is done, we’re ready. Here’s a summary of the process I’m going to teach you today:
How to check your XML Sitemap for errors with Screaming Frog
- Open Screamingfrog and select “List Mode”
- Grab the URL of your sitemap.xml file
- Head to Upload > Download Sitemap
- Frog will confirm the URLs found in the sitemap file
- Click Start to start crawling
- Export the data to CSV or sort by Status Code to highlight any potential issues
Open Screamingfrog and select “List Mode”
Open Screamingfrog and select “List” via the “Mode” drop down menu:
Now go and fetch the sitemap.xml URL.
Grab the URL of your sitemap.xml file
First, start by finding your XML Sitemap URL. If it’s your website, obviously you’ll just know this. It’s either found at “/sitemap.xml” or “/sitemap_index.xml”.
Copy the URL head to Screamingfrog.
Head to Upload > Download Sitemap
Head to Upload > Download Sitemap and paste the URL in the dialogue window:
Frog will confirm the URLs found in the sitemap file via the next dialogue window:
Click “OK” and the crawler will automatically run.
Once the crawl is complete, you can export the data to CSV or sort by Status Code to highlight any potential issues. You don’t want 301 redirects or 404 errors – go ahead and resolve those problems!