I’ve had two separate occasions in as many weeks where I’ve needed to answer the question: “What Are My Most Linked to Subfolders?” for a client, or for myself. I thought it would be quite fun to do a quick write up on how I answer this question using a simple Excel query, some Webmaster Tools data, analytics data or Open Site Explorer.
In this post we’re going to use raw external link data from Webmaster Tools to quickly identify the most linked to subfolders on a site. You can repeat the same process using Open Site Explorer data, too.
Download your external linked to URL data
First, download your external links data from Google Webmaster Tools – here’s some anonymous data showing a site I’m working on and how many links their top pages have (blurring idea courtesy of DaveN)
Fun with tables, pivots and Excel queries
Do the usual “CTRL+L” to create an Excel table, I’ve called mine “GoogleLinks”.
In a new column, paste the following query:
=MID(GoogleLinks[[#This Row],[Page]],24,FIND(“/”,GoogleLinks[[#This Row],[Page]],25)-23)
This query starts at a specified point and plays back the characters in a cell until the next “/”. I’m using MID to start at a fixed position (24 = “http://www.domain.co.uk/”) and FIND to count the number of characters to the next “/”. Subtracting the count of characters in the domain from the count of characters to the next “/” gives me a column listing the folder name only next to the count of links in the data.
The end result is a little something like this chart, lovingly created using a pivot table:
You’ll notice the #VALUE! result in the 4th column. That’s because my query gets upset if there isn’t another “/” from my MID starting point. Not to worry, try this slightly “improved” query:
=IFERROR(MID(GoogleLinks[[#This Row],[Page]],24,FIND(“/”,GoogleLinks[[#This Row],[Page]],25)-23),MID(GoogleLinks[[#This Row],[Page]],24,LEN(GoogleLinks[[#This Row],[Page]])-23))
If there’s an error, a slightly modified query kicks in, using LEN to give a count of the total characters after our starting position. The count, along with the MID query makes sure we can just play back all of the characters found in the cell. The end result looks a little better:
What’s the point of this?
In my humble opinion, there’s always been huge value in analysing your competitors top pages, and your own. When we’re dealing with a larger site architecture or we’re in need of the 10,000ft view, a count of all links by subfolder can add additional insight into the mix. I also believe that having an idea of your best linked to subfolders can help a lot with understanding strong and weak points with inbound link authority to certain content types, especially after the May Day update. That thought is for another post though.
Fun looking at the competition, or just big websites
In the meantime I’ll leave you with this fascinating view on the most linked to subfolders and pages at Google.com – data courtesy of Open Site Explorer and SEOmoz’s Top 500 list (my source of inspiration…)
This approach worked well for me – but you may have to tweak the queries slightly for it to work for you.
If you have an alternative suggestion, I’d love to hear it. Let me know your approach and I’ll gladly update this post with your queries!