Check out Mihai Parparita’s Google Reader Tidbits, about how he used Google Reader hacks to do a bunch of clever feed splicing. The article is interesting, and I think Atom is going to enable a bunch of feed-mashup creativity that I’m not smart enough to invent. But I wanted to do a deep-dive on the actual Atom feed he generated, which is probably of interest only to obsessive Atom 1.0 fetishists.

In this piece, when I use the technical terms moron, asshole, and angel, I do so in the sense described in Mark Pilgrim’s monumental Why Specs Matter.

Let’s reproduce the top bit of the feed, which I’ve edited to make the line-wrapping a little less painful; a ... marks each deletion. There are occasional instances of moronic behavior, but they’re in the minority; in fact it’s generally clever.

 1. 
 3. tag:google.com,2005:reader/user/10963671381103576324/label/tech
 4. Google Reader
 5. Items labeled "tech" by Jason in Google Reader
 6. CIaUuM7D9YMC
 7. 
 9. Jason
10. 2006-03-24T00:37:22Z
11. 
12.  
13.      tag:google.com,2005:reader/item/f9cb5b94346faeb4
14.  
16.  
17.   tag:google.com,2005:reader/feed/http://www.engadget.com/rss.xml
18.   Engadget
19.   
20.  
21.  US government supports Apple stand on French law
22.  2006-03-23T18:30:00Z
23.  2006-03-23T18:30:00Z
24.  
27.  <p>Filed under:...
29.  Marc Perton
30. 
31. ...

Line 2: They’ve introduced their own google-reader gr: namespace for their own extensions; quite proper.

Line 3: Using the tag: URI scheme for permanent, unique IDs seems popular in Atom-land, and I’ve seen suggestions that it be adopted as a best practice. Me, I’d prefer to just use the HTTP URI for the ID, because if you’re going to practice responsible Web stewardship, it’s going to be bloody well just as permanent and unique. But I’m something of an acknowledged fanatic on this subject.

Line 6: This is a Google-Reader-specific extension of some sort, and I have no idea what it means, and that’s just fine. Atom requires that software tolerate this kind of thing, which is why we’ll probably never need Atom 2.0 or even 1.0.0.0.1.

Line 12-13: The gr:original-id is usually the same as the value, except when the feed’s coming off of feedburner, then it points at the real article, not the feedburner redirect.

Line 14: This a little brittle: the URI in href= is relative, but to what? To wherever you happened to pick up the feed from, I suppose. I think a feed-level xml:base might be in order. But the title= is good practice, lots of software will pop up a tool-tip.

Line 16-19: This element is here to tell you about the original feed that this entry came from; there’s more on it in Mihai’s write-up. There are a couple of things in it that are weird. First, there’s that gr:stream-id= attribute. I think that’s what the element is for, and in fact the Feed Validator warns about that missing link. The Validator also warns that the source feed’s updated timestamp is missing. Actually, I can see lots of scenarios where it would be OK to ignore those warnings, but since they’re actually providing the value, they should put it in the right place.

Also, the id element inside the source is distinctly strange. This is supposed to be the required unique identifier of the Atom feed the entry originally came from. But it’s not an Atom feed, it’s Engadget’s RSS feed. So, they made up a reasonable-looking ID. It can’t really meet the strict Atom requirements since it’s obviously not going to be universal; in fact an asshole might argue that this somehow violates the spec, but they’d be wrong.

Of course, if they do happen to copy in an entry from an actual Atom feed, they’d better re-use its actual Atom id rather than make one up, or they’d be morons.

Lines 18 and 21: They’ve labeled the titles as type="text" which is the default, so you might want to chop this if you were bandwidth-constrained.

Line 24: Once again, rel="alternate" is the default and hence not strictly necessary, but the type="text/html" is really good practice, making it super-easy for software to do the right thing.

Line 27: Now this is clever. They’ve copied in Engadget’s HTML text and since who knows if it’s well-formed or not, they’ve said type="html", escaped it, and furthermore, provided the xml:base= so that any relative links in there are less likely to break. This is the work of an angel.


picture of the day
March 23, 2006
· Technology (85 fragments)
· · Atom (91 more)
· · Syndication (67 more)

By

I am an employee of Amazon.com, but the opinions expressed here are my own, and no other party necessarily agrees with them.

A full disclosure of my professional interests is on the author page.