Back when we cooked up XML in 1996-97, there were good reasons to have that ugly upper-case gibberish at the top of your XML documents. That was almost ten years ago; now it’s time to do away with it, and also time to have a spec for Doctype-free XML.
Why DTDs? · The Doctype declaration is supposed to do three different things for you:
Say what the root element of your document is supposed to be.
Point to your DTD, so you can check if the document is valid.
Define entities. These are used as names for boilerplate strings, for special characters, and (in theory) to include external documents.
The first two are are not only unnecessary but actively harmful. First of all, we have better technology than DTDs for doing syntax validation. Second, it’s usually a bad idea for a document to express an opinion as to what its own schema is. Most useful languages have more than one schema, and it’s the absolute right of someone receiving a document to decide whether it meets their definition of “valid” before they use it, so they’re going to make their own decisions as to which schema (if any) is appropriate.
Entities as names for strings are handy but not essential.
Their use as names for non-ASCII characters will be missed; people who write
mathematics in a plain-text buffer have become used to typing
δ for “Δ”,
∞ for “∞”, and so on.
But I’m sitting here typing this document
in a text editor and if I want to write Δ or ∞ or チムブレー, there are a
bunch of ways I can do it without requiring any
entities. The problem only bites you when, first of all, you’re hand-authoring
your document without any help from the editor, and second, you’re on an
operating system that doesn’t let you enter those characters.
This problem space is getting smaller every year.
The use of entities to include external documents has never really caught on, although in principle it sounds useful. I suspect that at the end of the day, the encoding layer is just the wrong place to stick hyperlinking and document composition functions.
Actively Harmful ·
A lot of XML is used in network protocols, and in network protocols,
is actively harmful. Receiving software in
principle might have to go and fetch an external DTD, or failing that, be
prepared to deal with the voodoo around
the XML declaration. For this reason, popular network applications of XML
like SOAP simply forbid the Doctype declaration.
But this is kind of bad behavior, as
RFC 3470 (AKA BCP 70) points out, if
you’re going to use XML, it’s best to just say “Use XML” and not get into
subsetting the syntax.
So the take-away is, it would be nice to have an actual single spec for this kind of XML that leaves out all the complexity and black magic that falls out of the Doctype declaration. We already have proposals.
The Simplest Thing That Could Possibly Work is Norm Walsh’s
proposal from 2003, which
makes only one change to the XML specification: no
Norm later on got a bit more ambitious and cooked up an XML 2.0 proposal, which not only nuked the Doctype but had clever proposals for new namespace and entity syntax.
Then there’s my own XML-SW, which actually substantially rewrites the XML spec, removing the Doctype and at the same time weaving in the Namespaces and Infoset specifications, giving you one place to go to read about all the basic can’t-live-without-’em XML technologies.
The Future · I suspect that nothing will happen; we’ll go forward indefinitely into the future, with XML implementors required to read the XML spec, learn to ignore all the Doctype stuff, then go read the Namespaces and Infoset specs, and synthesize all that in their heads.
That’s a pity; most interesting modern XML doesn’t have a
, uses namespaces, and is specified via the Infoset;
it would be nice if the specifications reflected that reality.