Which is to say, It’s Sunday and I just wired up my little publishing empire here to the new hotness in Web syndication technology, PubSubHubbub. If you’re running a hub and you’re not evil, let me know and I’ll ping you.
PubSubWhat? · It was initially a private project by a couple of Googlers which seems to have gotten legs here and there around the Web. The idea is best explained, with a slideshow even, on the home page; take a minute to read through it.
It seems painfully obvious that this whole thing was at least in part provoked by Twitter. Which is a new and very general-purpose communication medium; but it’s owned by a single company, and no matter how much we like them (and I do) it’s all wrong for a general-purpose Internet medium to be owned by anyone.
So I see PubSubHubbub, as much as anything, as an attempt to capture Twitter’s pattern of information flow in a reproducible, interoperable way. I think what we’d like to see is a large number of micro-publishers (just like on Twitter) and an even larger number of subscribers (just like on Twitter). But I think we’d like to see a moderate number of hubs to move all this goodness around — unlike Twitter which by definition has only one.
The effect I imagine is quite a bit like my Twitter client looks to me; except for, among the 140-character micro-posts there’d be summaries of real, meatier posts, with links to the full content, all produced as an automatic side-effect of people hitting their “publish” buttons.
Easy and Hard ·
Hooking up a publishing system to the PubSubHubbub machinery is damn easy;
I know because I just did it. You have to put
in your Atom feed pointing at one or more hubs that will be aggregating you.
Then, when you update your site, you need to ping the hub(s) using HTTP POST.
(In fact, you might not even have to; the pubs are perfectly capable of
polling publishers to check for updates.)
Subscribing through a hub is bit trickier. You have to process a potentially-asynchronous callback from the hub to verify that you really want to subscribe and aren’t just a spammer. What’s going to be even harder for a lot of people is that you have to be prepared to accept POSTs from the hub when it wants to tell you that there’s an update in something you’re subscribing to.
This last one is a key limitation of the system as it stands. The vast majority of desktops, at the moment, can’t accept POSTs because there’s a firewall in the way. So the utility of PubSubHubbub for ordinary end-point subscribers on ordinary computers is pretty limited. I can think of a few different ways you might try to work around this, but they’d require some community energy, so let’s see if any develops.
Building a really big scalable hub would be a challenge too, but far from outside the scope of what we know how to do. Me, I’d use Erlang in a flash, but there’d be other ways to go too.
The Spec · I first read the 0.2 spec yesterday and, since I’m a hopeless specification pedant, had to send a bunch of comments to the discussion group. It’s not terrible. I thought there were a couple of places where it offered unnecessary flexibility and probably wandered into YAGNI territory.
But there’s really only one thing that made me seriously nervous. Let me quote from release 0.2 section 7.3, Content Distribution:
... the feed-level elements SHOULD be preserved aside from the atom:entry elements. However, the atom:id element MUST be reproduced exactly. The other atom:updated and atom:title elements required by the Atom specification SHOULD be present. Each atom:entry element in the feed contains the content from an entry in the single topic that the subscriber has an active subscription for. Essentially, in the single feed case the subscriber will receive an Atom document that looks like the original.
Um... Excuse me!? Is the space between the lines here crying out
that a syndication hub should be considered within its rights to change
anything in my feed that’s not
atom:id? Like for example, insert
a Cialis ad in my first paragraph?
Protocols can’t enforce good behavior; if a sleazeball hub operator wants to fuck with the content there ain’t no protocol specification that’s going to prevent it. But in this area, the expectations need to be very clearly set.
Conclusions · I really don’t know. I just don’t see how, absent heroics like Skype has to use, POST-to-the-client is going to deal with the reality of ubiquitous firewalls. On the other hand, Twitter clients which rely on polling seem to make their users happy. I see nothing in the spec about supporting polling, i.e. how a client might ask a hub for its version of a feed, but that seems to me like it might be a real useful function.
So, in closing:
If you’re running a hub and would like ongoing to ping you when I update, I can fix that up; the latency should be a single-digit number of seconds from the time when I hit the “publish” button here on my laptop.
If you know of any interesting PubSubHubbub clients, let me know; I think I’m probably exactly the kind of person who’s apt to get good use out of one.