Crawling LARGE Websites? Use DeepCrawl

We were approached by Matt at to review their relatively young, but capable site crawl cloud based platform. When we first received the request, I was a little unsure as to how useful this tool will be when compared to well-known & comprehensive tools like Screaming Frog and IIS SEO Toolkit, both of which I’m a massive fan.

To quickly assess what DeepCrawl was up against, I pulled together some high level pros and cons of SF & IIS to set the scene:

Screaming Frog



IIS SEO Toolkit



As much as I love both of these tools they have the same critical drawback, and that is scale. For larger site crawls memory allocation for SF can burn out fast, and for IIS Toolkit the platform becomes unresponsive beyond a certain point. Even if you have the ability to successfully export to .csv, the files are so cumbersome that trying to manipulate the data in any form leads to heartache.

I’m ready for a divorce at this point, so let’s take a closer look at setting up a campaign in…

Getting started with DeepCrawl


When setting up a new crawl, if you’ve used something like IIS or SF before you’ll quickly become familiar with the environment, with noticeable similarities between each of the crawlers. All of the typical settings like crawl- depth, max urls, crawl rate etc can be found here, however, there are some interesting unique features including:

One particular feature that is extremely powerful and can also be found within the crawl settings, is the ability to compare past reports. Imagine crawling a test environment and comparing to the production site following go-live for outstanding/new issues – super useful for site migrations!

Reviewing site errors

Running a crawl for a site with over half a million URLs took ~48 hours to complete, after which we were notified and presented with the following dashboard:


Every issue identified can be investigated at a deeper level within 4 main tabs located at the top of the page:

  1. Indexation – An outline of all of the accessibility errors encountered whilst crawling, with the option to segment and export reports by error type.
  2. Content – This segment analyses on-page content errors such as missing page titles, descriptions, duplicate body content, content size, missing H1 tags etc.
  3. Validation – This section hones in on internal ‘link’ or ‘URL’ activity i.e. links resulting in 4XX, 5XX or re-direction errors, as well as types of re-direct, Meta directives and canonicalization.
  4. Site Explorer – Very similar to Bing’s WMT index explorer, but allows you to break down each directory by architecture, site speed, crawl efficiency and linking to allow for further prioritisation.

Helping you communicate & resolve errors faster…

This is where DeepCrawl really comes into its own.

Once you select an error type from any one of the tabs, at the right-hand side of the screen you’ll see an ‘add issue’ tab, that when clicked opens up the following dialogue box:


Adding an issue description, priority rating, actions and assigning team members to each task will then appear within an ‘all issues’ overview dashboard, like so:


This is such a useful collaborative way to monitor and prioritise errors. Once marked as ‘fixed’ can be re-crawled and compared to the previous report to ensure the issues have been resolved.

In summary

I’m still very much getting used to some of the functionality within, but first impressions are good.

The biggest advantage that DeepCrawl has over similar tools like Screaming Frog & IIS Toolkit is the sheer number of URLs that can be crawled and manipulated within the platform itself. As the tool runs in the cloud, there are no memory or timeout errors, whilst the tool also ensures you only download what you need to evaluate and resolve specific issues encountered at any one time.

The fact that DeepCrawl goes some way in helping you prioritise & communicate these errors to your development team is a valuable asset that the other tools can’t compete with.

Learn More

Builtvisible are a team of specialists who love search, SEO and creating content marketing that communicates ideas and builds brands.

To learn more about how we can help you, take a look at the services we offer.

Stay Updated

Follow: | | |

Tags: , | Categories: Research, Technical