A few days ago I saw this post on Search Engine Roundtable on a recommendation from a Googler that using PubSubHubbub is still a good way to go if you want to get content indexed quickly. It was something Google were recommending publishers implement: “we encourage publishers to submit their feeds to a public PuSH hub, if they don’t want to implement their own.”, etc.
I wrote about PubSubHubbub years ago (sorry – a few links are out of date and there’s a missing widget post as that particular service has been shut down). I also gave a Whiteboard Friday on the topic of faster indexation – but since then I’ve more or less forgotten about the whole subject.
So PubSubHubbub is Still a Thing?
I was pretty surprised to hear PubSubHubbub is still a thing, but apparently it is so if you want to have a play with it, use the WordPress Plugin that John Mueller mentioned in his tweet, and set up your own hub on Superfeedr (Here’s ours, if you’re interested!). I suspect the default / demo Hubs on App Engine and Superfeedr’s Open Hub are the best ones to ping if you’re interested in indexation experiments.
If you take a look at our RSS feed, you’ll see the following:
Does having PubSubHubbub implemented get your site indexed faster? I don’t know. You’d have to set up an experiment to know for sure. In my opinion, though, it’s just a thing that you should probably do if it’s as easy to implement as installing a WordPress plugin. Especially given John’s comments.
But, don’t sweat if you’re not able to implement quickly and cheaply. It’s a marginal gain, at best.
What else? Methods to Get a Page or Entire Site Re-Crawled
So, if you can’t just implement PubSubHubbub, what else is there? Let’s spend a moment revisiting methods to get a new page indexed by Google. Some of these are quite obvious but I use a few of them on a very regular basis:
Google Search Console
There are a few handy ways to encourage indexing with Google’s Search Console, provided that the site belongs to you and it’s verified.
Fetch as GoogleBot
Submitting a page to Fetch as Googlebot in Google’s Search Console leads to two options. You can submit the single page URL to Google’s index, or use the “Crawl this URL and its direct links” option.
There’s a limit of 10 requests a month to crawl the page and any links found on the page, so use sparingly. I’m pretty sure that it’s a per user, per domain thing, so if you’ve used up your requests, send someone who shares access to your site a friendly email. In my experience, you’ll have a page re-crawled and updated in the index within a few hours. Usually after a good night’s sleep, last night’s last minute “Fetch as GoogleBot” panic submission has paid off. Usually.
(Re) Submit Your Sitemaps
A few people I’ve spoken to at Builtvisible are of the opinion that sitemap (re)submission can trigger a flurry of crawl activity. I’d agree – I’ve found myself habitually doing the same on a major site change, but I also think there are a few alternatives:
– Submit your RSS feed
– Break up your sitemap into smaller files
– If you use a sitemap index file, submit the individual files anyway
– You could even use the ping URL for sitemaps endpoint. Not sure anyone’s listening, though!
Crank up Your Crawl Rate
Google advises you not to do this unless you have problems – so I’m not totally convinced this feature will ever speed up your crawl rate. That said, it might be worth testing – just watch your logs to see if the number of pages requested per day increased.
Mess With Your Hosting – Change IP Address or Switch to SSL
When we swapped IP addresses by accident (which became a fascinating case study into the effects of host location on rankings), I noticed Google Search Console reporting a large increase in the number of pages crawled per day. I didn’t screenshot the chart at that time, but I did when we switched to SSL:
Switching to secure made us fly! As it happens, it took about 3 – 4 days to completely replace all of the ranking URLs with a https version (we’re mid authority, about 1,000 pages or so). Still, pretty interesting stuff – I’m quite certain switching to SSL when you’ve fixed something like a page quality issue is a good way to get your performance fixed, fast.
I spoke to Dan and he mentioned an interesting point about checking that you’re not responding with stupid headers when you’d like to get something re-crawled. An incorrectly served 304 Not Modified response, for example, would deeply bugger up any chance of the index updating on a page! On a related note we talked about just disabling the if modified since detection to make the site always respond with a 200OK – this can have the side effect of speeding up a site wide re-crawl. Fascinating.
Re-launching a piece of content on a well linked to URL (and archiving the old content on a new URL) is sometimes a smart move. You might assume it’ll get revisited, although this tactic is usually put in place to build more link equity to a URL.
Ramping up some early social activity, particularly posting to Google+ or getting Tweets from authority accounts might help. Finally, if you’re just keen to try to get a bulk list of URLs re-crawled (think link audit / recent disavow / penalty removal) you could try Linklicious. I wouldn’t submit anything more than junk URLs to a service like that, though.