Creating efficient data collection systems for SEO and social

by on 24th October 2013

As marketers, we only have so many hours in the day. Nothing is worse than having your team spend time on menial tasks (like data entry) when there’s real work to do. That’s why it’s crucial for managers to create efficient systems – so teams can get more accomplished and make a bigger impact.

Most data entry work is delegated far down the totem pole. Managers are removed from data entry, so making the process as efficient as possible isn’t always a priority. If you aren’t “technical,” it’s easy to accept data collection as a necessary evil and justifiable expense.

Web scraping takes the “entry” part out of “data entry.” It saves a ton of time. However, web scraping is complicated. For a lot of new SEOs, it can be extremely intimidating. It was for me.

It’s our responsibility as managers to give our team members the keys they need to be successful – no matter how complicated or intimidating the task.

Lean Process Creation

Lean Manufacturing is a production process that considers the use of resources for anything other than the creation of value for the end customer to be wasteful. When it comes to data collection, I try to eliminate four types of waste that cause inefficiency:

  1. Motion (ex. opening up and logging into web apps to pull data)
  2. Waiting (ex. waiting on web apps to compile and  export data)
  3. Overproduction (ex. pulling more data than needed)
  4. Over Processing (ex. poor process design that results in wasted time)

In agency life, value for the end customer is actionable data and insights. Anything we do for a client that doesn’t result in actionable data and insights should be minimized as much as possible or cut from the process entirely.

We’ve all worked on teams where we’ve needed to collect data on a daily basis. These tasks may seem like a small time commitment at first, but they can really add up. By automating repetitive, low-intelligence tasks for your team, they can get entire weeks back on their calendar. Here’s a chart from XKCD visualizing how much time will be saved by automating a recurring task:

Is It Worth The Time

If you can automate a daily five-minute task for someone on your team, they get 1.2 whole days back every year. Your team can use that time to do the things that computers can’t: critical thinking, emotional reasoning, qualitative analysis — stuff you actually want your team spending their time on!

Creating a Dynamic Data Collection Template

Let’s say a few big conferences are coming up, and we want to track Twitter follower acquisition as a KPI for everyone that’s presenting. Before handing off data collection to someone on your team, it’s your role as a manager to compartmentalize and automate as much of the data collection process as possible.

Personally, I can figure out the XPath for a Twitter user’s follower count pretty quickly –  “//li[3]/a/strong” –  so that part of the task I’ll execute because it requires some background technical knowledge. I also scraped a hypothetical list of speakers who are going to be at the same conferences as us, along with their twitter handles. Using that information, I set up this scraper template:


Which translates to:


This approach to data entry/collection is exponentially more efficient.

Now someone on my team can open this document every day, refresh all the cells by hitting CTRL + ALT + SHIFT + F9 and then copy and paste the newly scraped data into another sheet. Over time, we will be able to chart some pretty insightful data, like this:


If you use these templates, there’s one important thing to remember: Excel caches formula values. Whenever the file is opened, Excel displays the values that were stored in the cell the last time the file was opened. In order to get fresh data, it’s necessary to rerun all formulas by pressing CTRL + ALT + SHIFT + F9.

It takes about 1 second per scraper formula, depending on the complexity level. While Excel is rerunning all these formulas, don’t touch anything.

When recording the data, it’s critical to Copy and Paste as Values in a separate sheet, that way you are saving the numerical data and not the scraper formulas – which could cause some headaches.


It’s important to not overload excel. If you are planning on scraping hundreds of data points with Xpath every day, do it in sections. To refresh formulas in section, use find and replace (ctrl + H) and replace = with =, this hack will recalculate all cells that you select to replace. It’s not pretty, but if your file is so huge that it crashes Excel, it may be a necessary step.

Download an example template here.

Anyone with a basic handle on excel can execute the remainder of this process. Systems like this are great, but XPath can be a little slow and buggy, that’s why I’ve taken to scraping with JSON whenever possible.

Scraping Made Simple: An Intro to Scraping JSON

JSON is a human-readable language, it’s extremely simple to pick up and requires no coding background. However, when you look at a JSON output from any API, it can be a little tough to decipher exactly what’s going on:


But, if you use a Chrome extension like JSONview, it’s really easy to pick up the parent-child structure of JSON.

Example: scraping the Rotten Tomatoes API

JSON is absurdly easy to scrape, and JSONview (for Chrome and FireFox) even lets you copy the JSON path with a right click.

First let’s pick what we’re going to scrape, for this example, the infamous Rotten Tomatoes Total Rating is what we are going after:

rotten tomatoes json api

Using SEO Tools for Excel to scrape JSON, just enter this formula structure:

=JsonPathOnURL([cell with URL or URL],"[object]")

To get the Total count from Rotten Tomatoes, the formula is:


Which Returns:

If the object you want to scrape has a parent object, just enter “$..” before the object you want to scrape (I have Richard Baxter for that protip):

=JsonPathOnURL([cell with URL or URL],"[$..child]")


Which returns:


Collecting Facebook Data with the Open Graph API

Many APIs require authorization tokens/API keys, but a few don’t – Facebook’s Open Graph API is one of them. To access it, just replace the “www” in the url with “graph”:


Use Case: Record Facebook Stats for Holiday Content

Let’s say we are working with a big box retailer this holiday – and we want to track Facebook stats every day, starting two weeks before Black Friday and going to Jan 1.

Using the Facebook Open Graph API, it’s going to be easy to scrape and automate this task.

We want to pull:

Data We NeedExcel Formula
Talking About Count:=JsonPathOnUrl(URL,”talking_about_count”)
Like Count:=JsonPathOnUrl(URL,”likes”)

Setting up this dynamic template is pretty straightforward, by concatenating usernames on to the end of the Facebook Open Graph API call, we call pull the stats we need:


Which pulls this data:


By collecting these KPIs over time, we can determine which big box retailer had the most successful holiday Facebook campaign and gain insights for next year.

Download the example template here.

Scrape Any Social Share Count

To get share metrics for all social media channels, you can use the SharedCount API. Here’s an example API call for a URL to a post I wrote recently. To get the JSON output, go to:[URL]

and you will see a screen like this:

shared count api

All of this information is extremely easy to scrape. If you’re tracking daily shares via social media for a piece of content, you can throw all of this information into a template to help scale data collection:


Download the Shared Count Scraper Template

In conclusion

Creating these types of efficient systems instead of manually collecting data saves our teams time and allows us to make a bigger impact for our clients.

As managers, it’s our job to help our team scale their efforts even if they don’t have the technical prowess to do it themselves. Creating similar templates is a pathway to freeing up entire weeks of your team members time. Excel can be tricky, and working with new APIs can be frustrating, but once you get the hang of it, you will be improving efficiency for your team at an exponential rate.

Get insights straight to your inbox

Stay one step ahead of the competition with our monthly Inner Circle email full of resources, industry developments and opinions from around the web.