Fun With PubSubHubbub, WordPress & Faster Indexation [Real Time Search & Data Distribution]

Once upon a time, it took weeks, if not months to get a new page crawled and indexed by a search engine. Later, the whole indexing process got faster – I found this post from 2007 from a very excited Patrick at Blogstorm discussing a web page getting indexed within a minute.

eprom

Image – 16bit EPROM by YellowCloud

The real time web

Thanks largely to the emergence of the real time web, search engines have had to work even harder, improving their massive infrastructure to get new content indexed fast. You don’t need to look for long to find evidence that fresher indexing is right at the top of Google’s priority list.

Faster indexing

A week or so ago Rand and I discussed the fascinating subject of getting your content indexed quickly using sitemap pings, Ping-O-Matic and PubSubHubbub. Today Let’s take a look at what PubSubhubbub is, and how you can implement it quickly and easily on your self hosted WordPress blog or website.

pubsubhubbubWhat is PubSubHubbub?

PubSubHubbub is a simple publish / subscribe protocol (hence  the “PubSub“) that turns Atom and RSS feeds into real-time streams. In its simplest form, PubSubHubbub is a way to get content in front of your subscribers in real time, via a hub.

Why is that cool?

To understand why PubSubHubbub is exciting, you need to understand the difference between push and poll. Crawlers, such as GoogleBot or Google Reader’s feed fetcher might be required to revisit your site periodically to check if there’s any new content to syndicate. This frequent revisit method is known as “poll”. It’s efficient when you think about it – because those crawlers have no idea if your site has changed until they arrive, discover new content, receive a 304 not modified response or even download the same old content they saw last time.

The alternative made available through PubSubHubbub is that a publisher can push new content to subscribers via their hub. The publisher talks to their (or a public) hub, the hub talks to all of the subscribers, pushing new content out to each of them in real time.

Isn’t this just ping?

We’ve had the capability to tell feed aggregators and syndicators that a feed has updated with new content for some time, using Ping. Ping-O-Matic, for example is a service you can use to tell services like Feedburner, Syndic8, Blo.gs and NewsGator that something has changed and that they should poll your RSS feed to get the latest content on your site. The thing is, those subscribers are still polling your feed (and visiting your feed URL) to grab the latest content. You ping, they fetch. With PubSubHubbub, the hub efficiently fetches the published feed content and multicasts the new/changed content out to all of the registered subscribers, making your site do a lot less work in terms of serving a URL.

It’s a difficult concept which is best explained by this video, created by Brett Slatkin and Brad Fitzpatrick, creators of the PubSubHubbub protocol.

Services that are compatible with PubSubHubbub

First off, Matt told me that (as of last week) Google organic search does not currently use PubSubHubbub as a direct discovery mechanism for new content. Though that news was a little disappointing to hear, it was before Google had officially announced that their Caffeine index was fully live. It makes perfect sense for Google and other search engines to use this protocol as a direct discovery signal, and I really hope we see that happen. Google did mention that they may use PubSubHubbub sometime in the future, so who knows. In the meantime, PubSubHubbub subscribers include: FriendFeed, Six Apart, Google Reader, Google Alerts (Blog Search / News indexing impact? Don’t know), Google Buzz, Ping.fm, NetVibes and Status.net. PubsubHubbub publishers include the hosted version of WordPress, Posterous – here’s mine and Tumblr.

The super cool new Google Ajax Feed API uses PubSubHubbub to update syndicated feed widgets (or whatever else you’re using the API for) in real time, too:

I’m super excited about PubSubHubbub, how can I implement it into WordPress?

If you’re self hosting WordPress, it’s pretty easy to get started. You’re going to need a hub, the URL for which will need to be included in your RSS feed header, and a plug-in to ping your hub. That’s it!

Set up a hub

To set up a hub, you have a few options. I’m experimenting with all of them, so I have (as yet) no real preference for any of these solutions

1) Use a public hub (no set up at all) at: http://pubsubhubbub.appspot.com/

2) Use a 3rd party hub service that requires validation during set up, like Superfeedr.com: http://seogadget.superfeedr.com/

3) Make your own hub using the PushPress plugin (I believe Charlie, of The Link Juice iPhone app fame is playing with this at the moment on his blog)

pubsubhubbub-seogadget

Ping your hub and include the hub URL in your RSS feed

When a PubSubHubbub compatible subscriber polls your RSS feed, you want them to see that you’re now a PubSubHubbub enabled publisher, You do this by referencing your Hub URL in your RSS feed header. Here’s mine:

My PubSub Rel Hub

Notice, the rel=”hub”? That’s a link based Microformat describing the location of my hub. If the subscriber is PubSubHubbub compatible, they should start subscribing to the hub, using the authentication process flow outlined in this presentation:

Updated Pub Sub Hubbub Flow Presentation

Fortunately, the PubSubHubbub plugin for WordPress will make it easy for you to add your hub URL into your RSS feed and will ping your hub every time you publish new content. Here’s what it looks like:

define-custom-hubs

PubSubHubbub is not just for search, it’s real time notifications and data distribution

PubSubHubbub is an obvious and important step into the world of faster indexing and discovery, but there are many more diverse applications to the protocol too. I read an inspiring series of examples and ideas for distributed, real time updates to personal and public profile data on sites such as Linkedin and Facebook, and heard stories about real time Mobile application Push notifications using tools like Urban Airship. I’ve got to say, this is an exciting new field for technology inspired digital marketers and software developers alike, and it’s definitely worth getting up to speed on the topic sooner rather than later.



Stay Updated: Sign Up for Webinar & New Blog Alerts

17 thoughts on “Fun With PubSubHubbub, WordPress & Faster Indexation [Real Time Search & Data Distribution]

  1. Sam Crocker says:

    Really helpful stuff here Rich. Thanks a lot- looks like you took some serious time working on these insights and looks like they could be extraordinarily helpful in getting things indexed more quickly.

    Most importantly- thanks for the WordPress walkthrough!

  2. Tek3D says:

    Thanks,
    Very informative article about Push, I have set up PushPress on my blog. Unfortunately, it is not used in Google Search :(

  3. Lee Hughes says:

    This is interesting..

    Can I just confirm something, i’m only a inch into my steep learning curve with SEO and code etc…

    So with this real time indexing I can publish a post and it will be indexed straight away, allowing people to read via form search engines.

    This is right?

  4. Hey Tek,

    Not yet at least – but I wouldn’t count on that being the case for long :-)

  5. Hi Lee

    In principle, that’s pretty much what search engine engineers might be working towards. In reality, there is still a delay. That said, with ping and newer technologies such as PubSubHubbub “real time” is as close to real time as it has ever been. Think pages indexed in seconds rather than minutes.

  6. Don’t make it over complicated for yourself… You’re probably already using using Feedburner for your website’s feed – and if that’s the case, then all you need to do to enable pubsubhubbub is…

    1. To go to your feed’s dashboard in Feedburner
    2. Click on the “Publicize” tab
    3. Click on “PingShot” in the left hand menu
    4. Activate the “PingShot” service

    And hey presto – you’ve now got a pubsubhubbub hub link in your feed! No plugins or any other messing about needed…

    More information about PingShot and Pubsubhubbub can be found here: http://www.google.com/support/feedburner/bin/answer.py?hl=en&answer=78988

  7. Hi James, sure Feedburner has had pingshot for a long time. That service now incorporates PubSub

    The thing is, if you stick with Feedburner, you are still waiting for Feedburner to poll your RSS feed. Haven’t you noticed the delay with Feedburner? Implimenting PubSub on your own site bypasses the additional delay. Of course, it’s down to you how you manage your set up – but do the test and try to measure teh difference between notifying your own hub and relying on Feedburner…

    Thanks for the thoughts buddy – it’s a really important point to make.

  8. Lee Hughes says:

    Ohh, this wil fix the feedburner delay?

    It drives me insane this delay, I have tested it on my own sites and I have 1 day delays for RSS subscriber notices to come through!

  9. There’s no delay if you ping feedburner… Feedburner refreshes instantly when pinged, rather than waiting the usual 30 mins.

    So make sure you’ve got http://ping.feedburner.com in your list of update services in WordPress.

  10. John C says:

    Thanks for this info Richard.

    Does listing ping.feedburner.com in the “update services” within wordpress achieve the same result as James suggests? That seems to be an easy way of achieving the same outcome.

    John

  11. Hi John (and everyone else who has been kind enough to comment here)

    There are a number of ping services available (Feedburner has one) but you can also ping Google when your XML sitemap has updated with a http request and / or use ping aggregation services like Ping-O-Matic. These will all (likely) greatly increase the fetch time taken whenever you publish. When you ping a service they come and fetch whatever has been updated.

    The PubSub protocol is different – in that you ping your own hub, which fetches the content and then pushes that to all subscribers of the hub. Your subscribers stop having to poll you, you push content to them.

    Hope that makes sense – in any set up I’d advise you to always ping Feedburner (if you use Feedburner) and Google when your sitemap is updated.

  12. John C says:

    Yes, I’ve got feedburner setup and use the XML sitemaps plugin as well.

    It’s pretty simple to activate the pubsubhubbub plugin and simply specify the appspot.com service. Any benefit in specifying more than one service?

  13. Genuinely not sure. It’s probably unnecessary to have more than one hub. That said I bet it wouldn’t hurt either.

  14. SME says:

    This is interesting, but I wonder what limitations exist/will exist. So pubsubhubbub pushes content to google news, Buzz (and others), via a self-created hub. This eliminates the need for Google to periodically scan for new content?

    So expanding that model, as more and more sites uses this or similar services, won’t Google have to eventually slow down the stream and make some decisions about what to serve? And doesn’t that get us back around full circle to the reliance on the subscribers (think Google organic as end game) to be able to handle and eventually make some decisions what to serve = delay? Or am I missing something fundamental here?

  15. jitin says:

    Yes! It is right and I am agree with you that wordpress become the faster caching of any website and good for seo too..

  16. Fantastic! I remember seeing this on Google’s Code page eons ago, but no other Internet authority has gone as far as to commit to its actual role in the Google Caffeine update. This is just what I was looking for thank you Richard! – Bradley

Comments are closed.