ongoing by Tim Bray · Bad, Feed Readers, Bad!

Piles of junk, I say. Pardon me, but I’m feeling grumpy. After much more work than it should have been, mod_atom is now generating reasonably coherent (not done yet, but getting there) HTML output and human-oriented (as opposed to APP-oriented) Atom feeds. It’s slightly idiosyncratic XML, with lots of namespace prefixes. The Feed Validator says it’s OK and I think it’s OK. But none of NetNewsWire or Vienna or Bloglines [Update: or Blogbridge or Safari, or My Yahoo!, or Sage] can read it correctly. I fart in their general direction. [Update: Google Reader, Planet Venus, Snarfer, SimplePie, Liferea, Awasu, Shrook, and Flock get it right! Good on ya, guys.] [Ah, Brent sent me a pointer to the latest beta of NetNewsWire 3.1, and it’s fine. I know other people rave about GoogleReader and Vienna and so on, but for me NNW is still way ahead of the pack in letting me scan a whole lot of news in almost no time at all.] Are there any other feed-reader implementors out there who think they can, you know, read XML correctly?!?! If so, get in touch, and if you process my little bundle of joy properly, I’ll lavish praise and links. Or if there’s a bug in the feed that neither I nor the validator can see, I’ll apologize humbly to the whole world. In any case, I’m going to have to go back and patch up the code so it doesn’t emit any of those nasty colons and relative URI references that apparently hurt implementors’ fragile feelings. This does not improve my mood. [Update: Just to be clear, I’m not talking about the ongoing feed; if you want to test your feed reader, contact me and I’ll point you at the test feed.]

Contributions

Comment feed for ongoing:

From: Sam Ruby (Sep 14 2007, at 15:41)

Planet Venus reads that feed just fine.

[link]

From: James Abley (Sep 14 2007, at 16:20)

Or you could just leave it as is, and consider it a philanthropic gift to Feed Reader authors everywhere in terms of a big wodge of test cases for them to chew on?

[link]

From: James Holderness (Sep 14 2007, at 16:23)

Um, this may sound like a dumb question, but where is the test feed? Or is your blog the test case? If so, it seems to work ok in Snarfer.

[link]

From: David Megginson (Sep 14 2007, at 17:33)

This is a bigger problem than just Atom.

Tim and I were both on the old W3C XML Working Group that designed the Namespaces spec -- in retrospect, we got too far in front of implementors' requirements and delivered a spec to solve problems someone might have some day in the future, instead of problems people actually had at the time.

While some people use Namespaces as intended, most apps I've seen either don't use Namespaces or effectively hardcode the prefixes, and many apps (not just feed readers) fail if you substitute different prefixes for the same Namespace.

I liked the final Namespace spec, even though it wasn't what I had originally argued for, but when you have a spec that almost *everyone* ignores or gets wrong (XSLT and SOAP excepted), it might be time to acknowledge that the problem is the spec instead of the implementors. I predict that the use of XML Namespaces will be an ongoing problem for Atom, even though it's not Atom's fault.

[link]

From: Aristotle Pagaltzis (Sep 14 2007, at 18:42)

Welcome to reality. Some of us have had the same awakening a while ago. This is what XML looks like to the world wearing RSS 2.0 goggles.

[link]

From: Julian Reschke (Sep 15 2007, at 01:05)

As far as I understand the problem, the only way to get namespaces that wrong is not to use a proper XML parser, or to let it run in non-namespace aware mode.

Well; I'd recommend to leave mod_atom the way it is, and use this is as in incentive to feed reader authors to get their code fixed.

[link]

From: Nick (Sep 15 2007, at 03:07)

I know you're not talking about this feed, but I'm curious about a strange variation in it's behaviour through Bloglines. Mostly, Bloglines cannot cope with your internal links and any embedded photos do not appear. But on two occasions this has appeared to be fixed. So I'm wondering if it was some change on your side, or are Bloglines fiddling with theirs?

[link]

From: Tim (Sep 15 2007, at 10:03)

Nick: I don't think I've changed the in-feed links recently. I don't use Bloglines that much, but at one point I know it was getting the ongoing feed right.

[link]

From: David Smith (Sep 15 2007, at 10:34)

Are you interested in testing Sage - the Firefox add-in?

[link]

From: Reinier Zwitserloot (Sep 15 2007, at 11:00)

XML is mostly dead, or at least, not alive yet.

Pragmatically speaking, what is a properly formatted HTML document?

The answer: Something that renders more or less the way you wanted it to in all major browsers. That's the only definition that counts. The fact that adhering to w3m standards is an effective way to get there is just a detail. You can't use the full box model eventhough it's a w3 standard because IE doesn't implement it right. This means HTML, effectively, doesn't have a single box model.

And now let's apply the same logic to Atom and RSS:

What, pragmatically speaking, is valid RSS, or valid Atom?

Answer: Something which virtually all readers get right.

Which means neither RSS, nor Atom, have any relationship to XML, other than that they sorta, coincidentally, look the same. This is a real shame, because life would be much easier if the vox populi definition of e.g. Atom actually did say: Parse it as real XML, and definitely don't try to roll your own regexp thing, or you'll fail to parse whole batches of sites.

That wasn't the case, and as a result, we've got this mess. It's unfortunate that 99% of all so-called 'XML-based' standards are actually not XML in the vox populi definition - doing whacky XML tricks like non-default namespaces, or user-defined entities, breaks everything.

Can this debacle be saved? Personally, between "XML-lite" (no namespaces, no entities, not even well-formedness. Just a generic 'parse as tag soup' approach) and alternative lightweight data formats like JSON, I don't see the full XML shebang going anywhere.

Shame, really.

[link]

From: Danny (Sep 15 2007, at 11:46)

Tim, I reckon your grumpiness is well-founded. The suckiness reminds me of a guy I knew who while in accommodation with a coin-operated gas meter, discovered it would work when fed with some plastic tokens he'd found. Worked fine, until the landlord emptied the meter. Geoff still wound up paying, though it was a lot more painful than if he'd followed the "spec".

While XML namespaces may have been "a spec to solve problems someone might have some day in the future, instead of problems people actually had at the time" - it's that future now. If you want to name things so they'll work predictably on the Web, you need URIs somewhere. I personally think XML namespaces offers quite a neat solution.

While the junk around syndication may be put down to its unfortunate history, it's troublesome to see some other trends around Web markup making similar mistakes. The global string-squatting common around microformats is one example, and hardly a month goes by without some new registry been suggested for this or that, when URI-based naming would make a perfectly good answer. Hey ho.

[link]

From: Tim (Sep 15 2007, at 12:49)

David Smith: Testing Sage is a good idea, but when I go to http://sage.mozdev.org/install/ and click on "click here to install" I get a 404 not found on http://releases.mozilla.org/pub/mozilla.org/extensions/sage/sage-1.3.10-fx.xpi

[link]

From: Kevin (Sep 15 2007, at 14:13)

Tim: regarding Sage, I think the URL you need to start with is https://addons.mozilla.org/en-US/firefox/addon/77 The install link from there should work.

It is my impression that Sage has been sorely neglected, if not completely abandoned, in recent months. I switched to Planet Venus nine months ago after the Sage devs failed to fix their xml:base implementation. I believe it is still broken (https://www.mozdev.org/bugs/show_bug.cgi?id=12582).

I would be surprised if Sage works with your feed and even more surprised if anything ever gets done about that.

[link]

From: Kris Arnold (Sep 15 2007, at 20:29)

Just saw a new beta build of NetNewsWire (3.1b18/1168) released with the following change:

"Fixed a bug resolving relative URLs when 1) there is no xml:base URL and 2) <a:link href=”” rel=”self”> has an empty href. NetNewsWire now uses the URL where the feed was downloaded, since no other base URL was specified."

Is this a fix for the problem outlined above?

[link]

From: Geoffrey Sneddon (Sep 16 2007, at 02:47)

Kris, no, it isn't. It's very similar to Aristotle's tests (linked from above comment by him). As long as you actually obey xml-names, it is a very simple test to pass.

[link]

From: Ross Reedstrom (Sep 17 2007, at 13:42)

Re: Sage testing

Don't know your test feed URL, so can't test sage on it, but the namespace issue that Aristotle mentioned works fine (At least, his mini test feed works) Email me the test feed, and I'll take a peek.

reedstrm@rice.edu

[link]

From: James Aylett (Sep 27 2007, at 06:39)

This is a huge problem, because *generating* Atom properly *without* namespace prefixes is actually quite hard. I generate all my Atom feeds using libxml2, and while it's possible to dick around with the prefixes involved, it's non-trivial and would add a fair amount of code. Frankly this code should live in the readers which have to cope with a wide range of feeds anyway; my feeds are legal, and work in some of the better systems without change.

Is this overly arrogant of me? I don't think so, but apparently at least half the feed reader authors disagree. Maybe I should bow to their unwillingness to use a proper XML parser (and proper handling code) for an XML format?

But no, hang on, that way lies Cisco. I really don't want to go down that road again.

[link]

What this is ·

Subscribe to ongoing

Truth · Biz · Tech

author · Dad
colophon · rights

picture of the day

September 14, 2007
· Technology (90 fragments)
· · Atom (91 more)
· · XML (138 more)

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.

I’m on Mastodon!