Tools of the Trade: Excel Tips from Richard at SES London 2012

While preparing myself for an SEO conference presentation, I like to practice to make sure I can fit in what I have to say, or in this case; show. Speaking on the SES London 2012 “SEO Tools of the Trade” panel, I gave  up a few interesting Excel tips using SEO Tools for Excel and the Adwords API extension. Here are the guides:

SEO Tools for Excel

Check your Sitemap for Errors with Excel

– Fun With The SEOmoz API – Get Links Data Straight into Excel

Adwords API Extension

Video: Excel Tips and Tricks

Excusing the occasional hesitation, I hope you find this video useful. I’ll update this post with my slide deck after the session. Enjoy!

Image credit: Lynn Friedman

Here’s the presentation

SEO Tools of The Trade

View more presentations from Richard Baxter

Video Transcription

 Hi, so this is a video to just cover the tips I’m going to be giving away at SES London 2012. This session is called “SEO Tools of the Trade,” which I’m going to cover a lot of the Excel stuff that we do in-house at SEOGadget. I know the other guys are going to be sort of talking about web application and subscription based tools, which is all well and good. So I took the Excel choice, because I actually use it in most of the SEO consulting. In fact, I use it in all of the SEO consulting I do from link analysis to search engine accessibility, review, so on.

I’m going to just basically show you some things that I found really powerful and really useful. Hopefully, you can take away some tips and tricks from this video and go off and have a play with them yourself.

 

I’m going to switch to Excel first of all. So we’re going to do a couple of things.

 

We’re going to do a little bit of search engine accessibility work. First of all by checking your sitemap, and specifically whether or not the URLs that you submitted in that sitemap are responding with the right http status, and whether or not those URLs actually match the canonical in the meta header of your web page. Then we’re going to have a look at how you can use Excel to identify bad links by fetching the page rank of the domain it’s linking to, and work out whether or not that’s a good idea or not, and whether or not you need to weed out any bad links to actually improve your rankings.

 

Then I’m going to do some work with checking whether or not the links in your data are actually still live. Tools like Open Site Explorer are excellent, but the data can be a little bit old, up to about 30 days old or thereabouts. So you might need something that helps you work out whether or not the link is actually still there.

 

Then we’re going to play with fetching data from the SEOmoz API into Excel, and then finally we’re going to work out how to get a location for a link into Excel as well. So we’ll work out whether or not all of the links that you’ve got inbound to your domain are from the UK or elsewhere. So let’s start with the first thing.

 

So when you create a sitemap, most search engine engineers, I think there’s a video on SEOmoz Whiteboard Friday, it basically says keep the noise levels low in your sitemap. And when you’re working a development team, you might make updates to the sitemap, and you wait for Google to download that and report back on errors. Actually, sometimes you might want to speed up a little bit. So for that Excel actually gives you the ability to import data from an XML feed. If you go to the data tab up here and check out from other sources if you follow the link From XML Data Import, you can paste in your sitemap URL here. When you click Open, it will download it.

 

Now here’s one I actually made earlier. I’ve created a column already called =HTTPStatus. Now let me just fill this out and leave it running, and I’ll explain actually what that’s doing. So HTTP status is part of an extension created by a chap called Niels Bosma, who is an Excel DNA developer, which is an extension for Excel that allows .NET developers to produce plug-ins and extensions for Excel. SEO Tools for Excel is suite of different functions, from things like checking the HTTP status of a URL or the IP address, all the way to fetching Google PageRank. There’s even a Google Analytics extension there too.

 

So I’m going to show you a few of those features. As you can see now, we’ve collected HTTP status in each of those URLs. So I’ve already got a 404 Error on one of those URLs, which I’ll go and check out later on. So we’ve got an error in our sitemap file. This is why we’d want to follow this kind of thing up. What I’ve found really interesting is just checking while we’re in here if the actual URLs themselves match with what we’re declaring in the meta header.

 

So if we use HTMLCanonical and just select that URL again, we’ll leave it running. So what it’s actually doing is going off and checking each of the web pages and then fetching back whatever matches or whatever is declared in the canonical header. Let me just check that. Niels’ tool is excellent, but sometimes you have to tell it not to tell you when there’s a 404 Error on a page, which is why we got that dialogue box. So there we go. Now we’ve got each of the URLs that appear in the canonical element in the web page. I’m just going to take that and paste the values.

 

One thing that this tool does, if Excel is set on auto recalculate, every time you make a change to a table, it will try to recalculate and re-fetch the data, which is a problem for functions like fetch page rank, which we’re going to look at in a moment. So now we know what the canonical is, we can just compare the submitted URL with canonical URL by using exact. Exact just looks at one piece of text, which is there, and compares that to the next, which is there, and tells you whether or not there’s a match. So if there’s a match, if the two data points are exactly the same, it will say true. If not, it’s false. So I’m going to send somebody off here to go and work out what’s going on with that URL and why that’s in our sitemap.

 

So the next thing that we’re going look at is how to identify bad links. I specifically have to look at how to find links that might have exact match anchor text in your inbound link profile, but also seem to come from very, very low value, very low or no page rank URLs and domains. I know that Patrick at SES today was saying that one of the things that you might want to do in your SEO campaign is clear up bad links. I tend to agree that it’s worthwhile at least knowing what’s going on in your back link profile.

 

So the first thing that we need to do is actually fetch the domain from this URL, which is relatively simple. We’re going to use MID, which selects text from a specific start point and plays back a number of characters after that start point. Now with any URLs you need to clear up the HTTP, which is exactly eight characters. And then the number of characters we want it to play back to us is actually dependent on the first trailing slash, if you want to extract the domain which you’d find with Find. I want it to find me the first trailing slash within that same text just there. Obviously, it’s going to start in position eight. We’re going to close that off, and then take eight away from the result and press Return. There we go, we’ve got the domains.

 

Now what I did when I was preparing this file was actually fetch the PageRank for all of these domains in this list here, because that does take some time. But if you’ve got a powerful enough computer, it does it quite well. So here’s one I made earlier. Now just to show you how we fetch PageRank in Excel, I’ll just show you the query. So it’s =Googlepagerank, like that, and just select the URL, press Return, and it should go off and fetch that. There you go.

 

That works pretty well. You can see we’ve updated the whole table here with all of those inbound links. So what we have to do now is just look up those values. So I’ve already got a Google PageRank column here over on the right. So if I go “=vlookup”, like so, I’m looking for that domain and I’m looking for that in the range Google PageRank, because that was my table name. The column index is two. That’s the next one along. I want to match that exactly, zero, so press Return. There we go. So for the most part, we’re doing a pretty good job of pulling through the PageRank here.

 

Now the really cool thing about Excel is that we can create a pivot table to have a look at how many PageRank low or PageRank no links we’ve got, which is pretty powerful if you want to filter down quickly. So if we just insert a pivot table, we’re going to insert that in a new page, and we’re just going to put that Google PageRank count there.

 

Now we’re going to count the number of domains that are linking to us with zero PageRank. Let’s just change that to Sum instead of Count. So now we’ve got the data, and we can see that there’s a large number of domains with very low levels or no PageRank at all. So if we just drill down on those links, let’s just have a quick look. So I’m pretty sure that, with Majestic ACRank, that’s incredibly low on the root domain. We’ve got lots of .infos, lots of directories. I wouldn’t be surprised if there’s some inbound links here that you probably don’t want the exact match anchor text linking to if you wanted any links at all. So not ideal.

 

So moving on, let’s have a look at the next trick, which is working out whether or not your links are still live. So when you’re working with link data from tools like Open Site Explorer, it’s really, really best practice just to be sure that actually that link is still live, it’s still pointing to you. This query here, there’s a blog post about this on SEO Gadget. This query here basically uses the SEO Tools for Excel function XPathOnURL. So basically, it has the ability to go to any URL that you specify and then execute an XPath filter. So in this case, we’re saying to go to A2, which is an SEOmoz blog post about keyword research, and yes, select all A elements that contain href SEOgadget. What that will do is give me the value if it finds it, but it will just return a blank value.



Stay Updated: Sign Up for Webinar & New Blog Alerts

14 thoughts on “Tools of the Trade: Excel Tips from Richard at SES London 2012

  1. This is excellent stuff Richard and thank you for sharing!

    Quick question though after watching the video… Was it supposed to abruptly end while you were in the middle of explaining how to check if your links were still live? Seems like it ended while you were explaining the values returned.

  2. Hi Michael, no – it wasn’t! That’s weird – maybe the youtube upload failed halfway. Let me check, and republish. Thank you for letting me know.

  3. No problem at all. The way the video ended felt almost like an infant feels when you take their bottle away before they are finished. :-)

    Another question… When identifying bad links, where did you import your backlinks from? SEOmoz OSE? MajesticSEO? Other?

  4. Andrea Moro says:

    Nice presentation today Richard,

    however I want to point you how on a possible problem you may have using the way you suggested for extracting the domain name.

    By using the MID and setting the starting num to 8 as you do, in the eventuality a domain name uses the https protocol your domain name extraction will fail as in the way you coded the function assumes there should be always a trailing slash at the end.

    So the following cases:
    http://www.test.co.uk
    https://www.test.co.uk

    will both fail with the following function
    =MID(A1,8,FIND(“/”,A1,8)-8)

    threfore, in the first case you will get a #VALUE! and in the second case an empty cell.

    If you replace your function with the following one
    =MID(A1,FIND(“://”, A1)+3,LEN(A1))

    You will be able to just copy an paste the function in any of your spreadsheet regardless the protocol type and without worrying about the trailing slash.

    Hope this may sounds useful to you as a couple of hints I got from your slides.

    Let’s catch up later tomorrow for a coffee at a certain point.

  5. Hi Richard!

    Thanks for sharing these tips, I like following your blog for keeping up to date with the latest Excel tools and tricks as I’m quite an enthusiast myself of using Excel for SEO.

    To check if a link still alive, I prefer to use the function CheckBackLink from the SeoTools, as it can also check if the link is followed, maybe it’s a better alternative to Xpathonurl?

  6. Hey Leonardo

    Agree on checkbacklink – the reason why I use XPath is that skill is not restricted to SEO Tools. Knowing XPath is hyper-cool, and works everywhere with everything :-)

  7. Andrea,

    That, my friend is an awesome tip. Thank you! I really appreciate the input (*heads to Excel templates to make a few updates).

    You should write a blog post for us! :-) nice to meet you at SES.

  8. I try to download as much data as possible (OSE, Majestic, AHrefs, GWT if I can get it), de dupe and start from there.

    Hope that helps!

  9. Andrea Moro says:

    Hey Richard,

    glad to hear you like my tip. Tell me on what subject you want me to write and I will do in my spare time.

    I like to contribute to infuse knowledge :)

    Perhaps we can sit in front of a coffee and talk about this.

    best
    Andrea

  10. Andrea Moro says:

    Hey Richard,

    glad to hear you like my tip. Tell me on what subject you want me to write and I will do in my spare time.

    Perhaps we can sit in front of a coffee and have a chat about this.

    Best
    Andrea

  11. Chris McGiffen says:

    I do like a good equation, but when it comes to getting domains from URLs I find a search and replace does the trick… first replace *:// with nothing, then replace /* with nothing, and that captures that vast majority of legit URLs.

  12. Certainly does, Chris – tee only head-scratcher is when it’s a URL list. so find and replace for page names, subfolders and file types would be very, very tough.

  13. Andrea Moro says:

    Totally agree with Richard.

    And actually the function it can be even better in this way

    =SUBSTITUTE(SUBSTITUTE(MID(A1, 1, IFERROR(FIND(“/”, A1, 10), LEN(A1))), “https://”, “”), “http://”, “”)

    being really sure to remove even page names, which I noticed are not parsed by the SEO tools function.

  14. Very, very cool Andrea!

Comments are closed.