Atomic Google Hacks

Check out Mihai Parparita’s Google Reader Tidbits, about how he used Google Reader hacks to do a bunch of clever feed splicing. The article is interesting, and I think Atom is going to enable a bunch of feed-mashup creativity that I’m not smart enough to invent. But I wanted to do a deep-dive on the actual Atom feed he generated, which is probably of interest only to obsessive Atom 1.0 fetishists.

In this piece, when I use the technical terms moron, asshole, and angel, I do so in the sense described in Mark Pilgrim’s monumental Why Specs Matter.

Let’s reproduce the top bit of the feed, which I’ve edited to make the line-wrapping a little less painful; a ... marks each deletion. There are occasional instances of moronic behavior, but they’re in the minority; in fact it’s generally clever.

 1. <feed xmlns="http://www.w3.org/2005/Atom" 
 2.       xmlns:gr="http://www.google.com/schemas/reader/atom/">
 3. <id>tag:google.com,2005:reader/user/10963671381103576324/label/tech</id>
 4. <generator uri="http://www.google.com/reader">Google Reader</generator>
 5. <title>Items labeled "tech" by Jason in Google Reader</title>
 6. <gr:continuation>CIaUuM7D9YMC</gr:continuation>
 7. <link rel="self" 
 8.       href="http://www.google.com/reader/public/atom/user/109..."/>
 9. <author><name>Jason</name></author>
10. <updated>2006-03-24T00:37:22Z</updated>
11. <entry>
12.  <id gr:original-id="http://www.engadget.com/2006/03/23/us...">
13.      tag:google.com,2005:reader/item/f9cb5b94346faeb4</id>
14.  <link rel="via" href="user/10963671381103576324/label/tech"
15.        title="Items labeled &quot;tech&quot; by Jason in Google Reader"/>
16.  <source gr:stream-id="feed/http://www.engadget.com/rss.xml">
17.   <id>tag:google.com,2005:reader/feed/http://www.engadget.com/rss.xml</id>
18.   <title type="text">Engadget</title>
19.   <link rel="alternate" href="http://www.engadget.com" type="text/html"/>
20.  </source>
21.  <title type="text">US government supports Apple stand on French law</title>
22.  <published>2006-03-23T18:30:00Z</published>
23.  <updated>2006-03-23T18:30:00Z</updated>
24.  <link rel="alternate" 
25.        href="http://www.engadget.com/2006/03/23/us..." 
26.        type="text/html"/>
27.  <summary xml:base="http://www.engadget.com/2006/03/23/us..." 
28.           type="html">&lt;p&gt;Filed under:...</summary>
29.  <author><name>Marc Perton</name></author>
30. </entry>
31. ...

Line 2: They’ve introduced their own google-reader gr: namespace for their own extensions; quite proper.

Line 3: Using the tag: URI scheme for permanent, unique IDs seems popular in Atom-land, and I’ve seen suggestions that it be adopted as a best practice. Me, I’d prefer to just use the HTTP URI for the ID, because if you’re going to practice responsible Web stewardship, it’s going to be bloody well just as permanent and unique. But I’m something of an acknowledged fanatic on this subject.

Line 6: This is a Google-Reader-specific extension of some sort, and I have no idea what it means, and that’s just fine. Atom requires that software tolerate this kind of thing, which is why we’ll probably never need Atom 2.0 or even 1.0.0.0.1.

Line 12-13: The gr:original-id is usually the same as the <link rel="alternate" value, except when the feed’s coming off of feedburner, then it points at the real article, not the feedburner redirect.

Line 14: This a little brittle: the URI in href= is relative, but to what? To wherever you happened to pick up the feed from, I suppose. I think a feed-level xml:base might be in order. But the title= is good practice, lots of software will pop up a tool-tip.

Line 16-19: This element is here to tell you about the original feed that this entry came from; there’s more on it in Mihai’s write-up. There are a couple of things in it that are weird. First, there’s that gr:stream-id= attribute. I think that’s what the <link rel="self" element is for, and in fact the Feed Validator warns about that missing link. The Validator also warns that the source feed’s updated timestamp is missing. Actually, I can see lots of scenarios where it would be OK to ignore those warnings, but since they’re actually providing the <link rel="self" value, they should put it in the right place.

Also, the id element inside the source is distinctly strange. This is supposed to be the required unique identifier of the Atom feed the entry originally came from. But it’s not an Atom feed, it’s Engadget’s RSS feed. So, they made up a reasonable-looking ID. It can’t really meet the strict Atom requirements since it’s obviously not going to be universal; in fact an asshole might argue that this somehow violates the spec, but they’d be wrong.

Of course, if they do happen to copy in an entry from an actual Atom feed, they’d better re-use its actual Atom id rather than make one up, or they’d be morons.

Lines 18 and 21: They’ve labeled the titles as type="text" which is the default, so you might want to chop this if you were bandwidth-constrained.

Line 24: Once again, rel="alternate" is the default and hence not strictly necessary, but the type="text/html" is really good practice, making it super-easy for software to do the right thing.

Line 27: Now this is clever. They’ve copied in Engadget’s HTML text and since who knows if it’s well-formed or not, they’ve said type="html", escaped it, and furthermore, provided the xml:base= so that any relative links in there are less likely to break. This is the work of an angel.

ongoing

What this is ·

Truth · Biz · Tech

author · Dad
colophon · rights

March 23, 2006
· Technology (90 fragments)
· · Atom (91 more)
· · Syndication (67 more)

By Tim Bray.

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.

I’m on Mastodon!