An SEO’s guide to site architecture

Richard Baxter

Visualising the principles of basic site architecture issues for SEO, identifying problems and devising actionable methods to solve them.

Background to this article

In 2011, I gave a presentation called “Solving Site Architecture Issues” at a conference called SES London. It summarised the experience I had gained working as an in-house SEO at a travel company.

I worked on sites that generated millions of pages, tens of millions in online revenue and, in the course of my work regularly encountered fundamental indexation issues. These were a result of the deep architectures associated with websites in the flight comparison space.

I dare say, that presentation contributed significantly at the time to the topic of site architecture for SEO. It also contributed significantly to the demand for consulting my (then small) company received and I cite that work as one of the most influential contributing factors that helped our agency find its way through its first years.

A lot of writers have since developed that original work (found here: 1,2) from 2011, redrawn the diagrams and published the ideas on their own websites, as is to be expected.

But only a few writers have moved the subject forward since that date.

And develop the subject they have. I would be remiss not to mention writers like Paul Shapiro, Will Critchlow, Patrick Stox, Jan-Willem Bobbink and Kevin Indig. They have taken this subject and developed useful technical theories which we’ll come on to later on in the article.

My aim, as always is to provide a sound, actionable basis for the beginner to intermediate SEO, working agency side or in-house to assist them in articulating this subject to their clients or peers. I write this article from decades of, and current, experience working on small, medium and large scale websites in competitive industries.

Quick Navigation: An SEO’s guide to site architecture

  1. The fundamental issues
  2. Flattening your architecture
  3. Methodologies to identify strengths and weaknesses
  4. Building your own link scoring
  5. Real world solutions

Site architecture for SEO is a fascinating subject, and one that I have always loved. It’s a subject that can be greatly enhanced when working in tandem with experts in UX or Information Architecture.

Addressing the fundamental issues of site architecture for SEO

One of the most impactful ways to improve the traffic driving potential of your website, especially in its long tail, is to improve your site architecture.

To do that, you need to understand the sort of classic issues with PageRank distribution we encounter in complex websites, and how those issues are addressed with real world solutions.

Let’s get started by addressing the classic issue of PageRank distribution in content silos. This is a typical situation encountered in a large scale website where many items are listed in a paginated series of category pages.

Naturally then, Ecommerce websites are highly prone to these issues, as are jobs board websites (read our guide to recruitment SEO here).

PageRank distribution along a simple site architecture

In our scenario, our home page links to a category page. That category page lists numerous products. The category page uses pagination, and therefore features paginated links that go on to pass PageRank to more product pages.

Pages at higher click depth (further “down” the architecture) receive less additional PageRank, which might mean they’re crawled infrequently (if ever). This reduces the likelihood of a new page being discovered which would naturally contribute negatively to the long tail traffic driving performance of the site overall. It also reduces the likelihood of say, a product page being recrawled – an issue should stock status change in the product’s schema markup.

Excessive click depth is a problem. It will be less likely you’ll find those pages ranking visibly in Google’s index.

In other words, the pages will rank, but at a threshold far below the 10 to 20 positions likely to drive any meaningful traffic at all. The large part of the problem is remarkably easy to address, though.

Flattening the site architecture

Great site architecture is all about helping users and search engines find their way around your site.

It’s about getting the best and most relevant content in front of users and reducing the number of times they have to click to find it.

By flattening your site architecture you can make gains in indexation proxy metrics such as the number of pages generating traffic from search engines and the number of keywords driving traffic to your site.

Ask: How many clicks does it take to get from your homepage to your deepest layer of content?

Let’s look at website that uses category and product page architecture. The silos are clear, with a link from the homepage being the only discovery mechanism for users and search engines to the deepest levels of content.

a typical site architecture for a retail website

To help solve some of the discoverability problem, let’s add some feature based navigation to the homepage.

For example, featured products, latest jobs or trending searches boxes. The change to the architecture could look something like this:

adding featured products as a navigational aid

Hopefully, you’ve developed your category pages as link worthy assets. Meaning, there’s a potential quick win by linking between pages at the category page level.

In the small architecture example below, these links would simply be handled by standard navigational links, though if you’re a larger website with hundreds or thousands of category pages and extensive faceted navigation, you’d need to find a more creative solution.

And finally, let’s add some related links between our lower tier pages. You could also have “recently added” features on the homepage and category pages or consider most popular boxes and so on.

That’s the basics over and done with; we’re going to revisit real world solutions to common architectural problems later in the article. First, though, I’d like to look the options available for us to locate pages that are weak (and need links) and very strong (pages that can give links).

Methodologies to identify strength and weaknesses in your internal links architecture

Consider the scenario. You’ve inherited a medium size, commercial retail website. On the surface, it looks OK (it has a lot of the internal link features we’ve discussed already) but, being an optimisation specialist, you want to find some new opportunities.

Before we proceed, I feel a bit like we have to do this:

About PageRank and why domain level authority metrics are a bad idea for this kind of thing

The PageRank algorithm, in Google’s words “works by counting the number and quality of links to a page to determine a rough estimate of how important the website is.”

Reading about PageRank is, to an SEO like me an interesting read but a discussion on algorithms is out of scope for this article, and slightly above my pay grade. It’s of academic interest, of course but don’t feel pressured to go too deep into this stuff. I don’t worry about it much at all.

Over to Dixon Jones, then, who can explain this stuff better than I can. This explainer from Dixon Jones shows you how the maths of PageRank works and then goes on to demonstrate how the maths can deliver some pretty unexpected outcomes.

Of particular note, his explanation on why you should not look at domain level metrics to decide anything is to the point.

I’ll be using page level metrics, from now on then.

There’s an obvious and well known problem with Google PageRank. We don’t know Google’s PageRank scores for our pages.

Calculating PageRank derived metrics is hard. Most of us haven’t crawled the web, nor have we been running the massive computational resources required to come up with our own data. Tools providers do, though. Majestic’s Trust Flow metrics are a decent measure of the probable importance of a page in my opinion, so I’m happy to play with this as a replacement.

I’m going to take you, step by step through the process of building a dataset that can help you improve your internal links by identifying the stronger and weaker pages in your site architecture.

All gathered with everyday tools. Advanced understanding of search engine algorithms and statistics packages not required.

Diagnosing site architecture issues

Building a dataset for site architecture purposes will always be a variable process. Your work will depend very much on how advanced a client’s website already is. If, for example you’re just improving overall internal links by implementing classic real world solutions like “recently added jobs” then building a big data set probably isn’t strictly necessary on day 1.

But for more advanced applications there are some interesting ideas to apply here. Like Kevin’s internal PageRank methodology, for example. By combining data together from a variety of different sources, I think there’s scope to catch and kill site architecture issues in their tracks. There’s also scope to spot opportunities.

Take this methodology as it is: something for you to develop and modify to taste.

Tools

To get started, we’re going to get our favourite tools fired up. Among the notable contributors to this process, I must say that Sitebulb is a really interesting addition to my toolset.

That’s thanks to their internal PageRank calculation called Link Equity Score, which I’ll be using later on.

Sitebulb’s Internal links report – you could use indexable internal links report to reduce large data sets. The export comes with a handy internal link equity score, based on PageRank.

We’ll also be using the following:

To build a data table consisting of:

Build the data set

Start by collecting your raw data. I fetched some log file data from my Woocommerce website and imported the file into Excel. This import wasn’t perfect as I’m really only looking for a count of Googlebot requests per HTML page. For more advanced server log file analysis, read Dan’s amazing guide here.

import log data into excel

I’m going to use Screamingfrog to import data via Majestic, the Google Analytics API and the Search Console API.

majestic seo settings screamingfrog

Run your Screamingfrog crawl in the usual way, exporting the data when the crawl has finished.

Screamingfrog getting all the cool metrics. Yes I know, they updated the version the same day I updated this article…

Once you’ve exported your Screamingfrog and Sitebulb data, consolidate into a master table with the usual VLOOKUPs:

My combined data in Excel

Analysis

Here’s my completed data. I’ve created a pivot table and added filters for Crawl Depth, Search Console Impressions and GA Sessions, although it’s of course entirely up to you how you build yours.

Interesting takeaways

Analysing data like this is always going to be to your particular preference and, a process likely to be unique to the site in question. There are, of course, some universal take aways that we can all gather. For example:

Sort for pages that have low numbers of internal links, but have high impressions via the search console data. You could derive that this page has traffic potential and might benefit by sending more links to this page.

Sort for pages that don’t link out very much (low outlinks) that have high citation and link equity. You might consider using the authority of these pages to improve the rankings of others.

Sort by highest search console impressions and fewest internal links to identify pages that could benefit from better internal linking.

Sort by very high click depth and then by search impressions to identify page types that might benefit from better internal links.

And so on.

Something that really stood out to me while I was prototyping this data; the smart move would be to automate this entire thing and integrated the recommendations into a dashboard for SEO use. That or develop functionality to dynamically update links across an entire website. That’s an interesting thought and I’m sure there are in-house teams at some very large companies that have these tools.

Solutions in the real world

There are lots of ways to improve your internal links in the real world. Here are some examples of ways to enhance your cross links in a site architecture:

Travel

Kayak use this navigational feature to link to their destination and flight route pages.

Retail

Misguided use related categories, and popular styles links to improve their category level architecture.

Jobs

TotalJobs use this feature on their homepage to flatten their architecture.

In summary

Site architecture is an exciting, always evolving subject. Thanks to the tools we have at our disposal there’s so much scope to continually develop the process of tracking down and improving upon, issues with your site architecture.

Please leave your thoughts in the comments below…

Comments are closed.


Join the Inner Circle

Industry leading insights direct to your inbox every month.