Atom API Sketches

I’m thinking about Atom 1.0 from the coder’s point of view. I’m not thinking about the Publishing Protocol, I’m thinking about how you, the programmer, should go about inhaling and exhaling the stuff. I’ve never believed in One True API for XML, it’s just too broad-spectrum, but Atom’s pretty tightly constrained. Obviously, you can use something generic like SAX or one of the many DOM-style APIs, or one of the modern pull APIs. Maybe for Atom we could use something simpler and more natural. I’m thinking out loud in this space, this is far from finished, not even a proposal yet. But, I bet there are other people out there who care.

Constraints · Here are some of the things that should make an Atom API easier:

Atom elements mostly have unsubtle programmer-friendly data types, easily represented in O-O terms, with the exception of Text Constructs and “atom:content”.
The order of elements is not significant, except for the useful fact that all the “atom:entry” children of “atom:feed” are grouped at its end.
Some Non-Atom elements can be recognized as being what 6.4.1 calls “Simple Extension elements”, with simple, easily-modeled structure
A Feed doc is guaranteed to have title, date, and unique-ID children, as well as possibly other known Atom elements.
An Entry doc is guaranteed to have title, date, and unique-ID children, as well as possibly other known Atom elements.
The computation of which metadata values apply to an Entry is nontrivial, involving values from the Entry itself, feed-level metadata, and from an embedded “atom:source” element.

Guesses · Here are some predictions about the likely characteristics of Atom data in the wild:

Feeds, in practice, may be arbitrarily and unpredictably long, that is to say, have huge numbers of entries.
Entries, in practice, will be reasonably short.

Recommendations · I’ll try to avoid language-specificity, but I see the world through trifocals with the panes labeled “C”, “Java”, and “P-languages”.

Streaming · Because of the unpredictable size of Atom feeds, DOM-style APIs for whole feeds are probably unusable in many scenarios. Thus, a general-purpose Atom API must include a streaming capability, preferably in a pull rather than callback flavor.

Similarly, generating an Atom Feed must be possible in a streaming fashion, with entries going out on the wire as they are generated.

Iterating · For modern languages that have the concept, why shouldn’t a feed just present as an Iterator over the entries?

Metadata Distribution · Per-entry metadata is sourced from a combination of feed-level, entry-level, and source-level child elements. The API should hide the mechanics and let readers pull out the per-entry data. Finer control over where the metadata goes is probably required on the Atom-generation side.

Foreign Markup · Support for what the spec calls Foreign Markup which does not constitute Simple Extension Elements is not required, beyond offering a generic XML interface to the contents.

Text Constructs · I expect these things to be the major sources of complexity and difficulty in dealing with Atom; here are some ideas on how to approach them.

The case where you want to display a text construct is probably very common, and has two modes: you want HTML you can hand to a renderer, or you want raw text you can pour into a display widget. I envision two calls, perhaps named getText and getHTML, which take care of figuring out the type attribute and doing the right amount of unescaping. The only magic in the case of getHTML is the case where you have type="text"; in this case the call should wrap the text in a synthetic HTML div. getText is simple for type="text"; in all other cases, it should simply remove all the markup and return the raw text.

Content · There should be an isText call to find out whether a content element can return something useful for getText and getHTML. For non-text values of type or remote content, I don’t think it’s cost-effective to do much in the API other than exposing the type, src, and length values.

RSS · This API should work just fine with Atomic RSS, and might optionally try to deal with other flavors.

ongoing

What this is ·

Truth · Biz · Tech

author · Dad
colophon · rights

August 04, 2005
· Technology (90 fragments)
· · Atom (91 more)
· · Coding (98 more)

By Tim Bray.

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.

I’m on Mastodon!