How Do You Get New Pages Indexed or Your Site Re-Crawled?

A few days ago I saw this post on Search Engine Roundtable on a recommendation from a Googler that using PubSubHubbub is still a good way to go if you want to get content indexed quickly. It was something Google were recommending publishers implement: “we encourage publishers to submit their feeds to a public PuSH hub, if they don’t want to implement their own.”, etc.

I wrote about PubSubHubbub years ago (sorry – a few links are out of date and there’s a missing widget post as that particular service has been shut down). I also gave a Whiteboard Friday on the topic of faster indexation – but since then I’ve more or less forgotten about the whole subject.

So PubSubHubbub is Still a Thing?

I was pretty surprised to hear PubSubHubbub is still a thing, but apparently it is so if you want to have a play with it, use the WordPress Plugin that John Mueller mentioned in his tweet, and set up your own hub on Superfeedr (Here’s ours, if you’re interested!). I suspect the default / demo Hubs on App Engine and Superfeedr’s Open Hub are the best ones to ping if you’re interested in indexation experiments.

If you take a look at our RSS feed, you’ll see the following:

Does having PubSubHubbub implemented get your site indexed faster? I don’t know. You’d have to set up an experiment to know for sure. In my opinion, though, it’s just a thing that you should probably do if it’s as easy to implement as installing a WordPress plugin. Especially given John’s comments.

But, don’t sweat if you’re not able to implement quickly and cheaply. It’s a marginal gain, at best.

What else? Methods to Get a Page or Entire Site Re-Crawled

So, if you can’t just implement PubSubHubbub, what else is there? Let’s spend a moment revisiting methods to get a new page indexed by Google. Some of these are quite obvious but I use a few of them on a very regular basis:

Google Search Console

There are a few handy ways to encourage indexing with Google’s Search Console, provided that the site belongs to you and it’s verified.

Fetch as GoogleBot

fetch-as-google

Submitting a page to Fetch as Googlebot in Google’s Search Console leads to two options. You can submit the single page URL to Google’s index, or use the “Crawl this URL and its direct links” option.

submit-method

There’s a limit of 10 requests a month to crawl the page and any links found on the page, so use sparingly. I’m pretty sure that it’s a per user, per domain thing, so if you’ve used up your requests, send someone who shares access to your site a friendly email. In my experience, you’ll have a page re-crawled and updated in the index within a few hours. Usually after a good night’s sleep, last night’s last minute “Fetch as GoogleBot” panic submission has paid off. Usually.

(Re) Submit Your Sitemaps

A few people I’ve spoken to at Builtvisible are of the opinion that sitemap (re)submission can trigger a flurry of crawl activity. I’d agree – I’ve found myself habitually doing the same on a major site change, but I also think there are a few alternatives:

– Submit your RSS feed
– Break up your sitemap into smaller files
– If you use a sitemap index file, submit the individual files anyway

rss-too

– You could even use the ping URL for sitemaps endpoint. Not sure anyone’s listening, though!

Crank up Your Crawl Rate

crawl-rate

Google advises you not to do this unless you have problems – so I’m not totally convinced this feature will ever speed up your crawl rate. That said, it might be worth testing – just watch your logs to see if the number of pages requested per day increased.

Mess With Your Hosting – Change IP Address or Switch to SSL

When we swapped IP addresses by accident (which became a fascinating case study into the effects of host location on rankings), I noticed Google Search Console reporting a large increase in the number of pages crawled per day. I didn’t screenshot the chart at that time, but I did when we switched to SSL:

crawl-rate

Switching to secure made us fly! As it happens, it took about 3 – 4 days to completely replace all of the ranking URLs with a https version (we’re mid authority, about 1,000 pages or so). Still, pretty interesting stuff – I’m quite certain switching to SSL when you’ve fixed something like a page quality issue is a good way to get your performance fixed, fast.

I spoke to Dan and he mentioned an interesting point about checking that you’re not responding with stupid headers when you’d like to get something re-crawled. An incorrectly served 304 Not Modified response, for example, would deeply bugger up any chance of the index updating on a page! On a related note we talked about just disabling the if modified since detection to make the site always respond with a 200OK – this can have the side effect of speeding up a site wide re-crawl. Fascinating.

Other Stuff

Re-launching a piece of content on a well linked to URL (and archiving the old content on a new URL) is sometimes a smart move. You might assume it’ll get revisited, although this tactic is usually put in place to build more link equity to a URL.

Ramping up some early social activity, particularly posting to Google+ or getting Tweets from authority accounts might help. Finally, if you’re just keen to try to get a bulk list of URLs re-crawled (think link audit / recent disavow / penalty removal) you could try Linklicious. I wouldn’t submit anything more than junk URLs to a service like that, though.

Learn More

Builtvisible are a team of specialists who love search, SEO and creating content marketing that communicates ideas and builds brands.

To learn more about how we can help you, take a look at the services we offer.


Stay Updated

Follow: | | |

| Categories: Technical

7 thoughts on “How Do You Get New Pages Indexed or Your Site Re-Crawled?

  1. Stephen says:

    Hi Richard,

    Did you create separate Google Webmaster Tool’s listing for https site after migration?

    And above snapshot (crawling report) was belong to http site’s listing. Am I right?

    As change of address is not valid for http to https migration.

    Can you please confirm it? I am also planning to move site from http to https with URL structure change.

    Thanks
    Stephen

  2. hey Stephen – you can create a separate GWMT account for your https site, but you’re right: change of address is irrelevant in this case.

  3. Neo Ni says:

    Exactly, the same I used to get new or existing pages crawled by adding resubmitting sitemaps in webmaster and in urgent case of crawling I prefer “Fetch as Google” that helps.

  4. Here are few more things that we can use for speeding up recrawl process:
    1) Ping sitemap
    2) Ping pages
    3) Submit pages to G+
    4) Update page in WMT (tricky)

    So let’s discuss them one by one:
    1) All you need is to visit
    http://www.google.com/webmasters/sitemaps/ping?sitemap=URLOFSITEMAP
    http://www.bing.com/webmaster/ping.aspx?siteMap=URLOFSITEMAP
    This works similar as PuSH.

    2) Pinging individual pages is little bit tricky. You must use XML-RPC protocol to inform search engines about new page. There are few online services as Pingler, GooglePing, Ping-O-Matic. Also there are few desktop and mobile tools there just as SEOPingler (disclaimer – i’m author of this tool).
    This one is one of best method for crawling updating pages (even 3rd party one) because you can ping deep pages.

    3) Publishing to G+ is also effective with attached link in post. G+ and Google Bot recrawl page to get snippets. But cons is that leave too much traces of activity and publishing too many links is pure spam.

    4) This one is tricky! WMT have limitations – 10 or 500 submissions per account monthly for all assets there. But you can create new account and assign to site or site subdirectory i.e. http://site.com/blog/. I’m not sure but they may add this feature to WMT API.

  5. Great Article Richard!
    For me the best way is Fetch ad Google.
    But I do not deny to try WordPress plugin.

    Tnx!

  6. Vicky Choksi says:

    I think “Fetch as Google” is the good way for instant page index and cache. Nice post.

  7. Hi Richard,
    Thanks for sharing, I personally use “fetch as google” for every link to get crawl faster, still it does not crawl the very next moment we submit a new URL it does take 1,2 days in my case. On the other hand I never considered my website of getting re-crawled, after reading your post I will implement this asap and see the results. :)
    Peace Respect
    Regards
    ~Piyush Dhiman

Comments are closed.