XML’s tenth birthday is coming up next spring; here’s my sound-bite on What It All Means. XML is the first successful instance of a data packaging system that is simultaneously (human) language-independent and (computer) system-independent. It’s the existence proof that such a thing can be built and be useful. Is it the best choice for every application? Is it the most efficient possible way to package up data? Is it the last packaging system we’ll ever need? Silly questions: no, no, and no. JSON is already a better choice for packaging up arrays and hashes and tuples. RNC is a better choice for writing schema languages. A classic Unix-flavor file containing ordinary lines of ordinary text is the best choice of all, whenever you can get away with it. XML’s still a decent option, probably the best, for interchanging things that are (at least in part) meant to be read by humans. It could be improved. It might be replaced. Wouldn’t surprise me, either way.


Comment feed for ongoing:Comments feed

From: Leo Richard Comerford (Jul 21 2007, at 12:42)

When you dub XML "a data packaging system", are you stressing its use for structured data over its (original? most appropriate?) "semi-structured" role as markup for natural-language text? Why so, or not so? It would be interesting to hear your thoughts.


From: Mike Champion (Jul 21 2007, at 13:47)

I generally agree that it's probably the best format for human readable text, it could be improved and it might be replaced. I'm wondering what SHOULD be done about all this, given that XML has failed to displace non-wellformed HTML for human readable uses, JSON is replacing it in a lot of scenarios for which it was intended, but nothing is being done to improve it.

Is that a problem? If so, the window of opportunity to fix it is closing. The classic retort to "XML Sucks!" was "yes, but it's ubiquitous, so use it anyway". That's not true anymore ... the unfixed (and maybe unfixable) problems have allowed tag soup HTML / RSS to thrive, JSON to emerge, numerous more efficient XML-like formats to take hold, and so on. I don't see a lot of demand for improvement given that people don't really have to use it anyway.

My personal guess (don't blame the Company-that-must-not-be-named in Redmond) is that XML will fade into the infrastructure and be used for the things it's really useful for and ignored otherwise. XML and its associated tools will end up as analogues to Unicode and string libraries today. Useful, definitely ...interesting, only to hyperspecialists.


From: Tim (but not THE Tim) (Jul 21 2007, at 19:40)

I like XML for many purposes; but one of its real beauties is that if it _is_ replaced, it contains or enables the tools to do so.

To move my data from XML to whatever the next big thing is will most likely be "merely" an XSL transform - I probably won't have to write any kind of program beyond that.

If the "next thing" doesn't provide this ease of transformation, I believe XML as data container may remain for a long time a lingua franca similar to text files and comma-delimited files.


From: Manuzhai (Jul 22 2007, at 03:48)

Well, I think pretty-printed JSON is easier to read and also MUCH more concise (which adds to the readability, IMO). Also, the fact that UTF-8 is a requirement (so that there are no complicated encoding/decoding rules) and that it has a (limited, but useful) concept of data types makes it a better choice for most scenarios where data-oriented XML might be used.

On the other hand, if you're doing something markup-oriented, JSON is not a viable alternative, so XML would still be the top choice.

While I recognize (as some-other-Tim says, above) that transformation through XSLT is a great asset, the simplicity of using and transforming JSON using, say, Python, makes that not as much of an issue. I.e., where XML is complicated enough to need an extra transformation vocabulary, JSON is simple enough that you can write an elegant and simple script in Python (or Ruby, or PHP, or...) and handle the transformation just as well (most likely in fewer LOC).

(Sorry for all the parentheses!)


From: Jacek (Jul 22 2007, at 05:10)

Mike, I feel discord between the first part of your comment and the conclusion - Unicode and string libraries are ubiquitous in data processing, yet you seem to be saying that XML will only be used in special situations. What gives?

Myself, I'm a fan of XML for most data exchange. JSON was only invented as an optimization because Javascript doesn't have a nice standard XML library. The successor of XML should learn from the problems of DOM.


From: Mike Champion (Jul 22 2007, at 10:54)

Jacek, sorry to be confusing. My argument is that "XML-like" formats are ubiquitous, but the distinctions among real XML, JSON, non-wellformed HTML, binary XML, etc. are being encapsulated within libraries and are less and less interesting to users or even developers. Who besides hyperspecialists really care today if a web site is real XHTML or tag soup, or if a syndication feed is Atom or something that vaguely conforms to RSS conventions? I wouldn't say that XML will be used only in special situations, but I would say that it will be used in situations where its limitations are not real problems, and likewise for JSON, HTML, etc.

If one accepts the argument that there is dimininishing pressure to use real XML, I think there is a corresponding reduction in any pressure to improve XML to better fit the needs that HTML, binary XML, JSON, etc. are filling. I used to think this was a problem, and that XML should be tweaked to help it meet its potential as a very general meta-format.

I'm losing enthusiasm for that mission, partly for the reasons that Len Bullard states in his cryptically elegant way in a comment to http://www.oreillynet.com/xml/blog/2007/07/wheres_xml_going.html "Don't weep for the stuff. Cheer for the mammals." To be slightly less cryptic, it's all evolution in action, and "evolution is cleverer than you are".


From: Kris (Jul 22 2007, at 12:26)

In the non-web computer world, there have been things like XML for a while now (not sure 10 years, but something like that). The Hierarchical Data Format (HDF) comes to mind, it has been likened to "binary xml" by some:


Obviously it would be the Wrong Thing for most web-based data, but for arrays of meteorological data, etc. it is pretty good.


From: Tim (but not THE Tim) (Jul 22 2007, at 12:43)

Unicode-8 is a requirement?

Did "the industry" go through all that work to come up with Unicode only to decide that we can use only a subset?

It's not a debate for this forum, nor a topic on which I am eminently qualified, but it just seems to me restricting things to a subset throws a lot of that work away.


From: silverpie (Jul 22 2007, at 18:30)

UTF-8 is not a subset of Unicode--it's one of several ways of expressing the entire Unicode repertoire. (And the JSON specs also allow UTF-16 and UTF-32.)


From: Norbert (Jul 22 2007, at 19:05)

Thanks for the correction, silverpie - I was about to say the same. Maybe Tim (but not THE Tim) was confused by the broken implementations of UTF-8 that only allow for a subset, such as the UTF8 encodings in MySQL and Oracle (Oracle also has a full implementation of UTF-8, but under the name AL32UTF8).


From: Nick Mudge (Jul 25 2007, at 07:35)

Thanks. Enlightening to read your comments on XML.


From: Ruffin (Jul 27 2007, at 14:40)

Human readable + machine readable + a need to operate with anonymous, unpredicted (though not necessarily unpredictable) consumers == time to use XML. XML is ASCII glorified in such a way that an anonymous machine can still create an outline of it. Great blog post. Add anonymous and you're spot on.


author · Dad
colophon · rights
picture of the day
July 21, 2007
· Technology (90 fragments)
· · XML (136 more)

By .

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.

I’m on Mastodon!