Check out Mihai Parparita’s Google Reader Tidbits, about how he used Google Reader hacks to do a bunch of clever feed splicing. The article is interesting, and I think Atom is going to enable a bunch of feed-mashup creativity that I’m not smart enough to invent. But I wanted to do a deep-dive on the actual Atom feed he generated, which is probably of interest only to obsessive Atom 1.0 fetishists.
In this piece, when I use the technical terms moron, asshole, and angel, I do so in the sense described in Mark Pilgrim’s monumental Why Specs Matter.
Let’s reproduce the top bit of the feed, which I’ve edited to make the
line-wrapping a little less painful; a
... marks each deletion.
There are occasional instances of moronic behavior, but they’re in the
minority; in fact it’s generally clever.
1. <feed xmlns="http://www.w3.org/2005/Atom" 2. xmlns:gr="http://www.google.com/schemas/reader/atom/"> 3. <id>tag:google.com,2005:reader/user/10963671381103576324/label/tech</id> 4. <generator uri="http://www.google.com/reader">Google Reader</generator> 5. <title>Items labeled "tech" by Jason in Google Reader</title> 6. <gr:continuation>CIaUuM7D9YMC</gr:continuation> 7. <link rel="self" 8. href="http://www.google.com/reader/public/atom/user/109..."/> 9. <author><name>Jason</name></author> 10. <updated>2006-03-24T00:37:22Z</updated> 11. <entry> 12. <id gr:original-id="http://www.engadget.com/2006/03/23/us..."> 13. tag:google.com,2005:reader/item/f9cb5b94346faeb4</id> 14. <link rel="via" href="user/10963671381103576324/label/tech" 15. title="Items labeled "tech" by Jason in Google Reader"/> 16. <source gr:stream-id="feed/http://www.engadget.com/rss.xml"> 17. <id>tag:google.com,2005:reader/feed/http://www.engadget.com/rss.xml</id> 18. <title type="text">Engadget</title> 19. <link rel="alternate" href="http://www.engadget.com" type="text/html"/> 20. </source> 21. <title type="text">US government supports Apple stand on French law</title> 22. <published>2006-03-23T18:30:00Z</published> 23. <updated>2006-03-23T18:30:00Z</updated> 24. <link rel="alternate" 25. href="http://www.engadget.com/2006/03/23/us..." 26. type="text/html"/> 27. <summary xml:base="http://www.engadget.com/2006/03/23/us..." 28. type="html"><p>Filed under:...</summary> 29. <author><name>Marc Perton</name></author> 30. </entry> 31. ...
Line 2: They’ve introduced their own google-reader
namespace for their own extensions; quite proper.
Line 3: Using the
tag: URI scheme for permanent, unique
IDs seems popular in Atom-land, and I’ve seen suggestions that it be
adopted as a best practice. Me, I’d prefer to just use the HTTP URI for the
ID, because if you’re going to practice responsible Web stewardship, it’s
going to be bloody well just as permanent and unique. But I’m something of an
acknowledged fanatic on this subject.
Line 6: This is a Google-Reader-specific extension of some sort, and I have no idea what it means, and that’s just fine. Atom requires that software tolerate this kind of thing, which is why we’ll probably never need Atom 2.0 or even 126.96.36.199.1.
Line 12-13: The
usually the same as the
Line 14: This a little brittle: the URI in
relative, but to what? To wherever you happened to pick up the feed from, I
suppose. I think a feed-level
xml:base might be in order.
title= is good practice, lots of software will pop up a
Line 16-19: This element is here to tell you about the original feed
that this entry came from; there’s more on it in Mihai’s write-up.
There are a couple of things in it that are weird.
First, there’s that
gr:stream-id= attribute. I think that’s what
<link rel="self" element is for, and in fact the Feed
Validator warns about that missing link. The Validator also warns that the
updated timestamp is missing. Actually, I can see
lots of scenarios where it would be OK to ignore those warnings, but since
they’re actually providing the
id element inside the
distinctly strange. This is supposed to be the required unique
identifier of the Atom feed the entry originally came from.
But it’s not an Atom feed, it’s Engadget’s RSS feed. So, they made up a
reasonable-looking ID. It can’t really meet the strict Atom requirements
since it’s obviously not going to be universal; in fact an asshole
might argue that this somehow violates the spec, but they’d be wrong.
Of course, if they do happen to copy in an entry from an actual Atom feed, they’d better re-use its actual Atom id rather than make one up, or they’d be morons.
Lines 18 and 21: They’ve labeled the titles as
type="text" which is the default, so you might want to chop this
if you were bandwidth-constrained.
Line 24: Once again,
rel="alternate" is the default and
hence not strictly necessary, but the
type="text/html" is really
good practice, making it super-easy for software to do the right thing.
Line 27: Now this is clever. They’ve copied in Engadget’s HTML text
and since who knows if it’s well-formed or not, they’ve said
type="html", escaped it, and furthermore, provided the
so that any relative links in there are less likely to break. This is the
work of an angel.