Not interested in the story or example scenarios? Jump to the heading starting with ‘Screaming Frog Custom Extraction’.
As an SEO, chances are you’ve worked with a WordPress CMS and installed the Yoast plugin. One of its best features is the ability to generate an SEO ‘score’ for each of your pages based on occurrences of their target keyword. It also allows you to observe instances of that keyword in the copy of the target page.
For sites which aren’t on WordPress, doing this manually can be laborious. This is why I’ve put together the process below using Screaming Frog and Excel.
What could you use it for? Well, if we put ‘keyword density’ to one side, knowing the count of target keywords per page is useful for:
- A benchmark of current targeting, useful in many scenarios such as future reporting or tracking implementation
- To mitigate the risk to organic performance from significant copy changes, for example during a re-branding and/or domain migration
- Data for internal anchor text-led fix for cannibalisation against one target keyword
- Auditing targeting on discrete versions of a mobile site and desktop site, with different sets of copy, ahead of the switch to the mobile first index?
Note: Tracking keyword frequency is one part of the due diligence an SEO should complete if tackling the scenarios above – this technique can be a useful canary in the coal mine of copy changes, but there are many other important factors to consider. If you’re working on a site migration, read more on mapping redirects & mitigating risk in site migrations
Screaming Frog Custom Extraction & LEN + SUBSTITUTE + UPPER in Excel
This process was built with more than a little help from my colleagues (thanks Tom, Philip & Tony), utilising:
- Screaming Frog’s custom extraction tool (available to paid users) to extract the relevant page content
- A combination of LEN + SUBSTITUTE functions in Excel (with a side of UPPER) to count the keyword(s). Courtesy of Dave at Exceljet
Depending on the templates present on the site you are analysing, this process has quite a simple set-up but can save you loads of time counting keywords within the copy relevant to the user and search engine robots.
With this process, you can aim to exclude page elements like common navigation anchor text links and potentially duplicate sections of copy that sit across all pages.
Before we start, you’ll need:
- Keyword mapping (keywords mapped per URL on your site, preferably in the same row as the relevant URL)
- A Screaming Frog paid license to use the custom extraction tool*
*Note: There are alternatives for scraping on-page copy. I’ll leave you some links below.
Extracting copy using Screaming Frog
The steps below are a cut down version of the excellent, in-depth guide by Screaming Frog themselves on how to use their custom extraction tool.
It’s worth a read: Screaming Frog’s guide to web scraping.
Step one: Screaming Frog extractors
Open up the configuration tab in Screaming Frog> Custom> Extraction.
You’ll see a screen of the following extractors, up to 10. In an ideal world, you will only need to use a few to extract the relevant page copy.
Tip: if you want the clean copy, select that from the drop-down on the far right to strip out any HTML from the copy you’re scraping.
The next step is to fill out the XPath or CSS selector field that contains your copy.
Step two: Extracting the XPath or CSS Selector from your target site
Essentially, we’re looking to use the browser’s built-in developer tools to copy a reference to element which contains the relevant page copy. In most modern browsers, you can open the dev tools simply by hitting F12.
For WordPress sites (particularly blog posts), extracting the reference is often easy as the templates are consistent across multiple pages and you don’t always need multiple extractors. Simply right click on the DOM element in question, go Copy > Copy XPath.
We’ll use Builtvisible as an example:
Then simply copy the XPath or CSS selector into the custom extraction fields back in Screaming Frog. Select the appropriate drop-down for XPath or CSS Selector:
Note: I have usually had more success extracting the relevant copy with the CSS selector.
Step three: Select your spider mode
Have your URLs to hand from your keyword mapping? Select the ‘List’ mode in Screaming Frog and paste your URLs.
Need to crawl the site looking for your copy? Say if you’re looking for un-linked keywords on other pages to drive more internal value to their target ranking page? Use the spider mode.
Step four: Export your extraction
Using either the internal-all or custom exports, and change the filter to extraction.
Check for errors or empty extractors missing text. If they are missing text the chances are that the element the copy is held on those pages is different, so inspect the page and do a further test on the extractors.
Remember: Screaming Frog gives you up to 10 different extractors to use per crawl.
Step five: LEN + SUBSTITUTE + UPPER
I lifted this formula from Dave over at Exceljet:
Replace text with the excel cell reference where all your extracted text is.
Replace keyword with the cell reference to your target keyword. If you’ve set up your keyword mapping this can be matched to different keywords per URL.
Place this formula in a column in-line with your exported URLs and text from Screaming Frog.
Bear in mind it will only count exact matches of your keyword, but luckily not case sensitive thanks to UPPER.
Everything being well, you should get a spreadsheet that looks like this (without the blur of course):
There you go.
Note: For this site I used 5 extractors to extract copy from a few landing pages that used a different template. You can use either the ‘&’ function or concatenation to condense the copy from extractors into one ‘text’ cell (as you can see in the column ‘All current body text and headings’).
If you can think of or have used this process for other uses, please leave a comment below.