In a recent essay I offered, given demand, to author some XML-writing software. There’s been quite a bit of feedback, and the consensus seems to be that the Java community is fairly well-served with XML writing software, but that this would be real useful at the C level. So that’ll be my coding fun for the month of February. The rest of this essay lists some of the Java options that people told me about, and introduces some issues around the C implementation.

From Java · Elliotte Rusty Harold pointed out XMLStreamWriter in StAX and also David Megginson’s XMLWriter (which isn’t being maintained, but it shouldn’t need much).

Henri Sivonen recommends GNU JAXP with some reservations about the accompanying GNU DOM package.

Rogers Cadenhead pointed to his own article on Elliote Rusty Harold’s XOM (hmm, which ERH didn’t).

And of course a couple of people recommended JDOM. The upshot is, it seems this community is well-served. But have a glance at what I propose for the C interface and see if the Java one covers all the bases.

From C · In the C domain, several people pointed to xmlwriter from libxml2 as being the best option. The trouble for a person generating a syndication feed is that xmlwriter has way more stuff than you need, including entry-points that for this kind of simple application would be actively harmful. Also, it doesn’t seem to guarantee well-formedness.

On the other hand, Daniel Veillard is a smart guy and the interface looks very sound, just too big. So I’ll use a small subset of calls with (I think) the same semantics but a different prefix.

Actually, I’ll add a semantic; if any call into this new library would cause the creation of non-well-formed XML, it will abort, return a special distinguished error code, and optionally raise an exception.

I think there will be two versions of the interface: one that uses unsigned char * and accepts only UTF-8, and another that uses wchar_t and accepts U+0000-terminated arrays of integer Unicode codepoints. Output will always be UTF-8. Botched UTF-8, illegal XML characters, misnested tags, duplicate attributes and all other artifacts of ill-formedness will be considered errors.

I think this should require exactly zero libraries aside from stdio and whatever .h you need to get wchar_t declared.

Name? · Anyone who can think of a snappy name for a basic C-language XML generation library will earn my eternal gratitude.


author · Dad · software · colophon · rights
picture of the day
January 16, 2004
· Technology (77 fragments)
· · Coding (98 more)
· · XML (135 more)

By .

I am an employee
of Amazon.com, but
the opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.