Today sees the publication of a thought piece by six authors the first of whom, alphabetically and (in my view anyhow) in intellectual standing, is Adam Bosworth.
The problem they're addressing is how, in the Web Services context, you put together a package of information some of which is XML and some of which is binary (a digital signature, an audio clip, whatever). After a pretty thorough walk-through of the issues, they come to two conclusions:
Their first conclusion is awfully hard to argue with. The second, though, really needs further exploration; they best they can do to quantify their worries is the following bit of hand-waving:
It is well known that base64 encoded data expands by a factor of 1.33x original size, and that hexadecimal encoded data expands by a factor of 2x (assuming an underlying UTF-8 text encoding in both cases; if the underlying text encoding is UTF-16, these numbers double). Also of concern is the overhead in processing costs (both real and perceived) for these formats, especially when decoding back into raw binary. When comparing base64 decoding to a straight-through copy of opaque data, the throughput of at least one popular programming system decreased by a factor of 3 or more.
Without some good hard quantitative evidence, I'd be inclined to argue that we should just use base64 to address this problem until someone proves it's too expensive. 33% size increase seems pretty cheap to me if what it buys you is fitting smoothly into the community of XML tools and expertise; and the cost of that 33% obviously depends on how much of what's being transmitted is binary, for which we currently have no numbers.
Also, on the face of it, I'd think that someone whose code runs three times slower because there's base64 in the loop should be taken out and shot, whether they're using a "popular programming system" or not.
The problem of wanting to stick binary data into XML is not limited to
the SOAP world; in fact it's common enough that it arguably ought to be part
of the basic XML processing machinery, so that you shouldn't have to have a
schema to use it.
We already have
xml:base, as part of the low-level infrastructure.
Let's introduce a new attribute
xml:binary with possible values
base64, so that you could take any
old element and say:
It would be easy, cheap, useful and break no existing software.