Information Architecture for SEO

The SEO community really doesn’t take much persuading when it comes to the importance of links. Sometimes it feels like they’re all we want to talk about.

And yet, we spend hardly any time talking about the most important and easiest links of all – the ones on our own sites. The links in our templates, our footers, our faceted navs, and our hideous drop down nested mega menus.

Those links might affect your site’s performance, at least as much as those you get from other sites. If you’ve ever done log analysis for large sites, you’ll have seen how Google’s understanding of a site is shaped by internal linking, and how unimportant pages can end up hogging the limelight. And this kind of issue isn’t limited to large sites – in fact, smaller sites, which have less strength to play with, need to be even more careful about how they use it.

When did you last review your information architecture?

Information architecture is a broad topic, which arguably includes almost everything that we traditionally call technical SEO, and a lot of UX. In this post, I’m going to focus more narrowly on quickly identifying simple changes to a site’s internal navigation that can boost the performance of your key landing pages. At its most basic, this is a process you could execute in half an hour.

Here’s what you can do.

Step 1: You will need

The main ingredients of today’s dish are all metrics at a URL level. Specifically, for any given URL, we need to know:

  1. Level/Depth: The number of clicks from the site’s homepage that any given URL is.
  2. Inlinks: The number of links from elsewhere on the site that a page has.
  3. Linking metrics: Measures of how strongly linked to the page is in general, including external links.
  4. Organic landings: The number of visitors who have landed on this page via organic search in a representative sample time period.
  5. Pageviews minus Entrances: The number of times this page has been clicked through to from some other page on your site.


  1. Outlinks: The number of other pages on the site that a page links to.
  2. Organic conversions/revenue.
  3. An estimate of internal PageRank flow (pinch your salts).


For items 1, 2, and 6 in the above list, I recommend Deepcrawl (paid) or Screaming Frog (you’ll need a paid license and a good chunk of RAM for sites of >500 pages).

For item 3, I recommend Majestic Historic Index Citation Flow for this sort of activity, not least because it can be easily sourced at a URL in bulk via their “Bulk Backlinks” feature. Moz and Ahrefs equivalent metrics are perfectly viable. All of these tools have free versions, but a paid license makes things much easier at scale.

For items 4, 5, and 7, I recommend Google Analytics. To improve your data quality, you could try using the Model Comparison Tool (under Conversions > Attribution) or Google Search Console data to avoid having Google Analytics’ default “Last Non Direct” model pollute your supposedly organic data.

For item 8, you could play around with something like igraph (requires R) or NetworkX (requires Python), which uses a Screaming Frog “all inlinks” report, converted into a graph object, to build a map of the site and estimate the weight of internal pages. This gives a more sophisticated internal strength metric than just raw count of internal links, albeit with a significant extra investment of effort.Want more advice like this in your inbox? Join the monthly newsletter.

Step 2: Get it together

For sites of up to around 100,000 URLs, it ought to be practical to pull this data together in Excel using Vlookups. If you’re not sure how, check out this handy guide.

gathering data together

For larger, enterprise class sites, you have two options.

Option 1: Quick and easy

If you’re trying to turn this around in a 30-minute window, or even a few hours, your best bet will be trying to put together a sample of pages that includes most templates on the site, and is around 100,000 URLs in size. This shouldn’t be too difficult, as most large sites generate the bulk of their size from a small number of “listings” templates, for locations, facets, or combinations of the above.

Option 2: Slow, advanced

If you want to be thorough on a site with hundreds of thousands to hundreds of millions of URLs, you could scale up the process:

  • Instead of running Screaming Frog on your laptop, run it in the cloud or on a machine with tens of gigabytes of RAM.
  • Use an analytics platform capable of unsampled reports such as Google Analytics Premium, or use the GA API to pull out data one day at a time.
  • Instead of using Excel, use BigQuery, Google’s cloud-based SQL database platform.

Step 3: Spot the outliers

This is where the analysis comes in. We’re looking for pages that are strong in some columns, but weak in others. For example, here I’ve pulled out pages with low numbers of outlinks, but strong performance in some other metrics:

spotting the outliers

This is as simple as sorting by internal links descending, and then by outlinks ascending.

The top row in particular, with 1,046 internal links and a Citation Flow of 35 but zero outlinks, is a major missed opportunity. Adding internal navigation to this page would help to recycle some of that equity around the site.

In the cases where it’s a PDF receiving this equity, it might make sense to create a new landing page, embedding said PDF, and employ a canonical tag in the HTTP header of the PDF, pointing to the new page.

Here are some other example outliers and their associated actions:

Low outlinks
High internal links or CF, or 1 click from homepage
Add internal navigation to allow equity driven into this page to be recycled to the rest of the site
Canonicalise/redirect this page to an equivalent URL with a fuller navigation
High clicks from homepage, or low internal links
High organic traffic %
Including this page in the main navigation or linking to it from some stronger pages, to make the most of its performance against the odds.
Use a page higher in the architecture to target the same queries, and redirect or canonicalise the old page.
1 click from homepage, or high internal links
Low organic traffic %
Low pageviews minus entrances (i.e. non-entrance pageviews)
Consider whether this page really needs to be part of the sitewide navigation. Perhaps it could be shown only to logged in users, or linked to only from relevant pages.
Strong linking metrics
Low organic traffic %
Could you be making better use of the strength of this page? Perhaps it’s an old blog post that needs a refresh, or an event you don’t run anymore. At the very least, make sure it has a fully featured main nav.

Honourable mention: mobile first crawling

Back in November 2016 Google put out an announcement about Mobile First Crawling that induced panic in many an SEO team. Many sites weren’t even entirely mobile friendly, or didn’t have consistent metadata and SEO directives on mobile. But almost no sites have fully functional internal linking on mobile, so what was going to happen to key landing pages whose mobile versions were poorly linked to, or even completely orphaned?

Personally, I’m not overly worried about this. I think the future of the mobile indexation and crawling is not about the mobile link graph so much as it is about moving away from the link graph entirely, and we’ll have to cross those bridges when we come to them, although there’s lots we can do to prepare in the meantime. If you want to read more about this (dystopian?) future of SEO, I recommend this excellent series of blog posts by Cindy Krum at Mobile Moxie.

Crucially, however, this is not an excuse to ignore your internal linking in the meantime – Google can only afford to abandon desktop crawling when the big sites and the ecosystem are ready, and while some verticals may be ahead of the curve, for the most part, this doesn’t look like it’s going to click into place overnight.


We have:

  1. Pulled together data sources representing the internal strength, external strength, and organic value of each page.
  2. Combined these data sources at a page level.
  3. Picked out cases where one of those areas doesn’t match with the others.
  4. Lazily dismissed mobile crawling for the time being.