Last week, I sent an email to one of the XML standardization lists at the W3C; my first presence in that conversation in quite some number of years. This short piece, of interest only to XML obsessives, gives a bit of background.
The Problem · There’s this problem in that XML discriminates slightly against certain ethnic groups. You can use any old Unicode character in an XML document, but the set of characters you can use to name an element or attribute is restricted to the letters that were defined in Unicode ten years ago, when XML 1.0 was published. (Yes, that was a bone-headed error by the designers of XML.) Since Unicode keeps growing, that means that a hypothetical programmer working in Cherokee syllabics or in Amharic would’t be able to use their working script in their tag and attribute names.
This is unfortunate. But not that unfortunate; remember, the users of the hypothetical programmer’s software could absolutely use their scripts in their own XML documents.
Solutions · The XML standardization community (a very small and overworked bunch of people) made a first attempt to solve this problem with XML 1.1. Unfortunately, that spec came with excess baggage, namely changed rules on what constitutes white-space, rammed through by IBM for the convenience of their mainframe customers. In any case, XML 1.1 has been widely ignored.
Now, the standardizers are trying once again with XML 1.0 (Fifth Edition). Basically, they’re re-written the rules governing the set of characters you can use to name elements and attributes. There is lots of discussion by the always-authoritative James Clark in XML 1.0 5th edition, including a rich set of links for those who want more background.
More Problems · Heretofore, I’d shut up. I didn’t think the Fifth Edition was a very cost-effective move, but I wasn’t in the room with the smart, overloaded people who were actually, you know, doing the work. But as James Clark points out, the change introduces an inconsistency between XML 1.0 and XML Namespaces 1.0, which is intolerable. They have to be either revised together or not at all. I understand that there may not be appetite or resources for such an effort. Sigh.
What I’d Like · I threw my hat in the ring years ago, with XML-SW, a proposed spec that includes XML 1.0, XML Namespaces, and the XML Information Set, but discards DTDs. There’s more discussion in XML-SW, a thought experiment, and Drop the <!DOCTYPE>.
If you’re going to go through the immense pain of revising the XML spec, focus on the real problems and do it all at once; which is what XML-SW does. I’d even volunteer cycles to work on such a thing. But I’d be astonished if it happened.