Check Your XML Sitemap For Errors with this Excel Tip

Today, I want to share a tip for quickly weeding out errors in sitemap XML files. This post will teach your a few new Excel tricks and (hopefully) save you some time in your own technical SEO audit projects.

Why a nice clean sitemap?

Google have invested a lot of time and effort into improving the sitemaps functionality in Webmaster Tools and the advice I’ve always heard from Google people is advice like; keep your sitemaps as error free as you can, use the correct canonical URL. I’ve always felt that a sitemap file with a very low load time is also advisable if you can speed up the dynamic elements of the file generation.

Google's sitemap Tool in WMT

 

Bing’s Duane Forrester was on a Whiteboard Friday talking about exactly this topic back in March last year. His comments about sitemap cleanliness were of particular interest:

We have a very tight threshold on how clean your sitemap needs to be. When people are learning about how to build sitemaps, it’s really critical that they understand that this isn’t something that you do once and forget about. This is an ongoing maintenance item, and it has a big impact on how Bing views your website. What we want is end state URLs and we want hyper-clean. We want only a couple of percentage points of error.

Duane Forrester

If you’ve not seen the video, I recommend taking a few moments to review what’s said:

Wistia

Checking for problems during a site audit

When I’m working on a site, I sometimes need to work out what state the XML sitemap is in. If I know there’s been a recent update to the file(s) it’s not always a good idea to totally rely on the data coming from Webmaster tools. If you’re ever in that situation, here’s how to get a fresher impression of the state of your sitemap.xml file.

Import your XML sitemap file into Excel

Firstly, head to Data > From Other Sources, and select the “From XML Data Import”:

select data in Excel

In the “File name” dialogue, type or paste in your sitemap XML file URL:

Click “Yes” when this dialogue appears:

And eventually, your entire XML sitemap appears, all nicely formatted in Excel:

Fetch the HTTP status code for each URL

Back in the old days, the quickest way to check the server header response of a list of URLs was to crawl the list with XENU on 0 crawl depth, and then VLOOKUP against the original list. Thanks to Niels Bosma and his SEO Tools for Excel, this process is much, much quicker.

Just create a new column and insert the following query into the first cell:

=HtmlStatus([@[ns1:loc]])

data with http status

Check that the canonical in the page header matches the declared URL in the sitemap

Aside from a rarely updated sitemap, the other common mistake i encounter is non-canonical URL submission in the sitemap file. The URL in the sitemap and the URL described in rel=”canonical” don’t match!

=HtmlCanonical([@[ns1:loc]])

exact

Check out the “Match?” column – we’re checking to see if the canonical declared in the web page’s header matches the actual URL in the sitemap. Oh my, it’s a simple formula:

=EXACT([@[ns1:loc]],[@Canonical])

Learn this stuff

Learn this stuff! If you’re not familiar with some of the tips in the post – that’s ok, I more or less have to relearn the more complicated stuff every time I start a new task with Excel. It’s enormously satisfying to make the time to teach yourself new tricks and Excel is brilliantly simple once you have the gist of it. Here are some resources to get you started:

Image: Splorp


Stay Updated: Sign Up for Webinar & New Blog Alerts

13 thoughts on “Check Your XML Sitemap For Errors with this Excel Tip

  1. shivun says:

    Richard

    Thanks for the useful post on the sitemaps and using excel to get the HTTP status codes for URLS. I have installed the seo extension and have tried to add a new colunm followed by adding the http status. I’m having no luck with this. I have simply pointed excel to my sitemap, loaded all the data, added a new colunm and pasted the code. I also replaced the typo html with http.

    Am i missing something? I am running office 2007.

    cheers

    shivun

  2. Hi Shivun

    Paste this query into a cell: =seotoolsversion()

    You *should* get a version number. If you don’t, the tools haven’t installed correctly.

  3. shivun says:

    Richard, when I click in a cell to type it shows the function called =seotoolsversion() but does’nt run?

    shivun

  4. shivun says:

    Something bizarre going on but it now returns the version number

    3.3.4

  5. shivun says:

    Richard

    I am now click in a blank cell and pastings

    =HTTPStatus([@[ns1:loc]])

    This returns an error code complaining about the name? Excel is not one of my strong points, am I missing something?

    cheers

    shivun

  6. Darroch Reid says:

    Hi Richard, You mention Xenu, but did you know that you check a sitemap.xml in Screaming Frog v1.80? You will then also get the HTTPStatus, and no-indexed pages (yes I have seen no-index pages listed in a sitemap.xml). However, to check for duplicate entries you need to return to Excel as Screaming Frog removes the duplicates when it starts the file.

    Darroch

    p.s. Version 3.3.4 of Seo Tools is not working at present try v3.2 for now

  7. Hi Richard,

    A quickest way to check an XML sitemap for errors is using Screaming Frog.

    1. Download and save locally the xml sitemap 2. Choose Mode->List 3. Select URL list file 4. Change Files of Type to SiteMap (*.xml) 5. Choose the XML file and click Open 6. Click Start to start crawling

  8. Hi Modesto

    Indeed! we use Screaming Frog here too (Jon blogged about it last week). I just wanted to demo Excel’s XML import capability – a little known feature.

    Richard

  9. Adrian says:

    Hi Richard,

    As with Shivun, when I paste the code into the cell, I get an error stating that “The name that you entered is not valid.

    Reasons for this can include: - The name does not begin with a letter or an underscore - The name contains a space or other invalid characters - The name conflicts with an Excel built-in name or the name of another object in the workbook”

  10. Gerry White says:

    Oddly enough I found this post just AFTER I did the same – doing this seems to take a while and my computer is running like a dog while doing it – granted these are pretty big files, but not too massive … I think Xenu or ScreamingFrog after cutting and pasting into a TXT file is so much faster!

  11. Gerry White says:

    If you have a paid version of ScreamingFrog it is far faster using the list mode it will check an XML Sitemap!!! Awesome – you need to save it locally first though…

  12. Ankur says:

    You do not have the tool installed. Download and install it from here – http://nielsbosma.se/projects/seotools/download/ And it should work fine!

  13. Hi all! We’ve just launched a sitemap validator as the first step towards building a comprehensive sitemap management web app and we’d like some feedback.

    Here’ the app: http://senkailabs.com/sitemap-checker/

    1) What would you like to see in a sitemap management application? Automatic sitemap validation and error checks? 2) How many sitemaps/sites do you manage on average?

    Thanks!

Comments are closed.