While preparing myself for an SEO conference presentation, I like to practice to make sure I can fit in what I have to say, or in this case; show. Speaking on the SES London 2012 “SEO Tools of the Trade” panel, I gave up a few interesting Excel tips using SEO Tools for Excel and the Adwords API extension. Here are the guides:
– SEO Tools for Excel
– Check your Sitemap for Errors with Excel
– Fun With The SEOmoz API – Get Links Data Straight into Excel
– Adwords API Extension
Video: Excel Tips and Tricks
Excusing the occasional hesitation, I hope you find this video useful. I’ll update this post with my slide deck after the session. Enjoy!
Image credit: Lynn Friedman
Here’s the presentation
I’m going to just basically show you some things that I found really powerful and really useful. Hopefully, you can take away some tips and tricks from this video and go off and have a play with them yourself.
I’m going to switch to Excel first of all. So we’re going to do a couple of things.
We’re going to do a little bit of search engine accessibility work. First of all by checking your sitemap, and specifically whether or not the URLs that you submitted in that sitemap are responding with the right http status, and whether or not those URLs actually match the canonical in the meta header of your web page. Then we’re going to have a look at how you can use Excel to identify bad links by fetching the page rank of the domain it’s linking to, and work out whether or not that’s a good idea or not, and whether or not you need to weed out any bad links to actually improve your rankings.
Then I’m going to do some work with checking whether or not the links in your data are actually still live. Tools like Open Site Explorer are excellent, but the data can be a little bit old, up to about 30 days old or thereabouts. So you might need something that helps you work out whether or not the link is actually still there.
Then we’re going to play with fetching data from the SEOmoz API into Excel, and then finally we’re going to work out how to get a location for a link into Excel as well. So we’ll work out whether or not all of the links that you’ve got inbound to your domain are from the UK or elsewhere. So let’s start with the first thing.
So when you create a sitemap, most search engine engineers, I think there’s a video on SEOmoz Whiteboard Friday, it basically says keep the noise levels low in your sitemap. And when you’re working a development team, you might make updates to the sitemap, and you wait for Google to download that and report back on errors. Actually, sometimes you might want to speed up a little bit. So for that Excel actually gives you the ability to import data from an XML feed. If you go to the data tab up here and check out from other sources if you follow the link From XML Data Import, you can paste in your sitemap URL here. When you click Open, it will download it.
Now here’s one I actually made earlier. I’ve created a column already called =HTTPStatus. Now let me just fill this out and leave it running, and I’ll explain actually what that’s doing. So HTTP status is part of an extension created by a chap called Niels Bosma, who is an Excel DNA developer, which is an extension for Excel that allows .NET developers to produce plug-ins and extensions for Excel. SEO Tools for Excel is suite of different functions, from things like checking the HTTP status of a URL or the IP address, all the way to fetching Google PageRank. There’s even a Google Analytics extension there too.
So I’m going to show you a few of those features. As you can see now, we’ve collected HTTP status in each of those URLs. So I’ve already got a 404 Error on one of those URLs, which I’ll go and check out later on. So we’ve got an error in our sitemap file. This is why we’d want to follow this kind of thing up. What I’ve found really interesting is just checking while we’re in here if the actual URLs themselves match with what we’re declaring in the meta header.
So if we use HTMLCanonical and just select that URL again, we’ll leave it running. So what it’s actually doing is going off and checking each of the web pages and then fetching back whatever matches or whatever is declared in the canonical header. Let me just check that. Niels’ tool is excellent, but sometimes you have to tell it not to tell you when there’s a 404 Error on a page, which is why we got that dialogue box. So there we go. Now we’ve got each of the URLs that appear in the canonical element in the web page. I’m just going to take that and paste the values.
One thing that this tool does, if Excel is set on auto recalculate, every time you make a change to a table, it will try to recalculate and re-fetch the data, which is a problem for functions like fetch page rank, which we’re going to look at in a moment. So now we know what the canonical is, we can just compare the submitted URL with canonical URL by using exact. Exact just looks at one piece of text, which is there, and compares that to the next, which is there, and tells you whether or not there’s a match. So if there’s a match, if the two data points are exactly the same, it will say true. If not, it’s false. So I’m going to send somebody off here to go and work out what’s going on with that URL and why that’s in our sitemap.
So the next thing that we’re going look at is how to identify bad links. I specifically have to look at how to find links that might have exact match anchor text in your inbound link profile, but also seem to come from very, very low value, very low or no page rank URLs and domains. I know that Patrick at SES today was saying that one of the things that you might want to do in your SEO campaign is clear up bad links. I tend to agree that it’s worthwhile at least knowing what’s going on in your back link profile.
So the first thing that we need to do is actually fetch the domain from this URL, which is relatively simple. We’re going to use MID, which selects text from a specific start point and plays back a number of characters after that start point. Now with any URLs you need to clear up the HTTP, which is exactly eight characters. And then the number of characters we want it to play back to us is actually dependent on the first trailing slash, if you want to extract the domain which you’d find with Find. I want it to find me the first trailing slash within that same text just there. Obviously, it’s going to start in position eight. We’re going to close that off, and then take eight away from the result and press Return. There we go, we’ve got the domains.
Now what I did when I was preparing this file was actually fetch the PageRank for all of these domains in this list here, because that does take some time. But if you’ve got a powerful enough computer, it does it quite well. So here’s one I made earlier. Now just to show you how we fetch PageRank in Excel, I’ll just show you the query. So it’s =Googlepagerank, like that, and just select the URL, press Return, and it should go off and fetch that. There you go.
That works pretty well. You can see we’ve updated the whole table here with all of those inbound links. So what we have to do now is just look up those values. So I’ve already got a Google PageRank column here over on the right. So if I go “=vlookup”, like so, I’m looking for that domain and I’m looking for that in the range Google PageRank, because that was my table name. The column index is two. That’s the next one along. I want to match that exactly, zero, so press Return. There we go. So for the most part, we’re doing a pretty good job of pulling through the PageRank here.
Now the really cool thing about Excel is that we can create a pivot table to have a look at how many PageRank low or PageRank no links we’ve got, which is pretty powerful if you want to filter down quickly. So if we just insert a pivot table, we’re going to insert that in a new page, and we’re just going to put that Google PageRank count there.
Now we’re going to count the number of domains that are linking to us with zero PageRank. Let’s just change that to Sum instead of Count. So now we’ve got the data, and we can see that there’s a large number of domains with very low levels or no PageRank at all. So if we just drill down on those links, let’s just have a quick look. So I’m pretty sure that, with Majestic ACRank, that’s incredibly low on the root domain. We’ve got lots of .infos, lots of directories. I wouldn’t be surprised if there’s some inbound links here that you probably don’t want the exact match anchor text linking to if you wanted any links at all. So not ideal.
So moving on, let’s have a look at the next trick, which is working out whether or not your links are still live. So when you’re working with link data from tools like Open Site Explorer, it’s really, really best practice just to be sure that actually that link is still live, it’s still pointing to you. This query here, there’s a blog post about this on SEO Gadget. This query here basically uses the SEO Tools for Excel function XPathOnURL. So basically, it has the ability to go to any URL that you specify and then execute an XPath filter. So in this case, we’re saying to go to A2, which is an SEOmoz blog post about keyword research, and yes, select all A elements that contain href SEOgadget. What that will do is give me the value if it finds it, but it will just return a blank value.