I hadn’t really planned to become well-informed about OOXML, but I have. So I thought I’d build my own personal list of reasons for and against OOXML becoming an ISO standard.
Some Rights Reserved · This essay is copyright © 2008 by Tim Bray, and this notice overrides the notice labeled “rights” which covers the other material published through this Web site. Excerpts from this essay may not be republished in any form unless they are accompanied by a link to the original essay and the following notice: “This is excerpted from Tim Bray’s On OOXML, which discusses both sides of the issue and which should be read in full for context.” There is one exception: the material in the first paragraph, appearing before this notice, may be freely excerpted by anyone for any purpose.
I’m serious. Anyone who cherry-picks content from this piece without playing by the rule above can count on hearing from an attorney PDQ.
Also, this represents only my opinions. They have been influenced by my twenty-one years of experience with XML and its predecessors, my readings from the OOXML spec, the national bodies’ comments on it, my discussions with my colleagues in the Canadian advisory committee, the experience from the Ballot Resolution Meeting at Geneva, and my colleague Lauren Wood. My thanks to all those people; but if I’m wrong about this stuff, it’s my fault not theirs.
And in particular, this does not represent my employer’s position.
Pro: It’s Not That Bad · OOXML has received a huge amount of review. I’m satisfied that as amended by the BRM, it constitutes an accurate (although unpolished and incomplete) specification of the data format used by Microsoft Office, starting with Office ’97. For example, I was impressed by the description of how Excel really works inside, as reflected in the XML it writes.
The fact that it’s thousands of pages long really isn’t that much of a usability
problem; the PDF version is well-structured and nicely hyperlinked. Assuming
you have some sort of decent PDF reader, if you want to find out how the
CEILING function works, or how fonts are assigned to the bits of
a PowerPoint slide, you’ll get there.
In fact, the guys at Microsoft who built the ad-hoc content-management system for this project should be proud of themselves, and they should write it up somewhere; I’m very impressed.
Pro: It Might Create a Market · There’s a decent chance that the publication of OOXML will help build an open ecosystem for people who want to write tools to read and write office documents, which would be good for the world and Microsoft too. I can imagine mining the inventory of documents for Business-Compliance or accessibility-checking purposes. Conversely, I can see generating OOXML documents from other business applications.
Now, you’ve always been able to do these things; Microsoft’s publication of the binary formats the other day was really a non-event, since they’d long since been exhaustively reverse-engineered. But this should reduce the cost and difficulty. First, because XML is easier to read and write than idiosyncratic binary formats. And second, because OOXML provides an actual explanation of what all the bits and pieces are and what they mean and how they fit together.
Now, for this to work, Microsoft is going to have to provide clarity as to the exact relationship between its products and the standard, whether that be ECMA-376 or a possible future ISO version. But the scenario gets a whole lot more likely with a specification in place; especially since Microsoft is shipping 70 million or so copies of Office per year.
Pro: Reduce Litigation Fear · Microsoft has promised not to sue anyone for using either the ECMA or possible ISO versions of the spec. Granted, their covenant is not as clean and clear as Sun’s, and there have been rumblings of suspicion from various quarters. But I think it’s pretty safe; Microsoft has created an expectation that these formats are free to use. Yeah, they could find a loophole and litigate anyhow, and it might even work. But it’d wreak so much havoc on their image, and their relationship with the world, that it’d amount to a suicide-bombing; they’d do some damage, but they wouldn’t survive it.
Con: It’s Pretty Bad · Treated purely as a spec for representing documents, OOXML is lousy. Frank Farance of the US ISO delegation was quoted as saying there are probably hundreds of defects. He’s being way optimistic. Every time I open it and start reading, I pretty soon come across some unforgivably-ugly piece of XML or hideous piece of English grammar or statement that just doesn’t make sense. There are going to be interoperability problems up the wazoo.
And while it received lots of vetting, the amount of scrutiny per page, compared to the standards I’ve worked on in the W3C and IETF, is laughable.
Then there are proprietary issues: for example, for a variety of OOXML
you can attach an attribute like this:
target="_media". Which is
totally IE-specific, thus Windows-specific.
Then there’s the fact that at the end of the day, this is basically an XML dump of a thirty-year-old computer program’s internal data format. Which severely compromises the data’s reusability; and I’ve always thought that opening the door to unanticipated reusability is one of the big payoffs for using XML.
There’s another big quality problem: OOXML allows the use of “custom schemas”; i.e. you can enrich your documents with your own elements and attributes, and still work on them with Office. The problem is, history teaches that this is a terrible idea. As far back as the Eighties, people were doing this with SGML, and of course XML tried to continue the practice. It should be instructive that there were several companies founded to build and sell the technology around this, and none of them ever made any money to speak of; they’re gone.
Custom schema work turns out to be horribly complex in practice, in setting up the editing and versioning and display and formatting technologies; a recipe for huge, years-long, consulting engagements, and really lousy results. Don’t go there.
Con: Microsoft Will Use It As A Club · Why does OOXML need to be standardized, since (in ODF) there’s already a perfectly-decent standard for Office documents, being actively developed? The official party line is that OOXML is different from ODF; all about standardizing a way to access the huge legacy base of office documents. And that sounds plausible.
But you know, that’s not how it’s going to be marketed. I had a couple of conversations with Microsoft people, raising the point that for most ordinary office documents, OOXML or ODF would work about as well. I got strong push-back; they told me all about how the ODF approach is limited and primitive and absolutely nowhere near as good as theirs.
So you can bet that the microsecond OOXML gets standardized, all that stuff about legacy coverage is going to be forgotten, and we’re going to see a full-bore global marketing assault about how this is the one true XML office document format; bigger, brighter, better, and everything else is a toy.
This doesn’t strike me as a good outcome.
Con: ECMA-376 vs. “ISO OOXML” · As a result of the process that culminated in the BRM, a bunch of stuff was added to the input ECMA-376 document. Most of it was good, as in you can use IRIs as opposed to DOS file-paths for pictures and so on, and it’s got some extra support for accessibility, and so on. But of course, none of that is supported by the hundred-million-or-so instances of Microsoft Office in the field. So if you’re an implementor writing data out, you better not use any of that stuff because, well, it won’t work. And even if Microsoft decides to include all those features in their products, they won’t be in the field for years and years. Similarly, if you’re writing code to read OOXML, you can pretty well skip the new markup, because there won’t be any of it.
So, if you believe the official story that OOXML exists to standardize the huge legacy of office documents, it’s irritating that it now contains a whole bunch of extra, hastily-added, “this-might-be-nice” features that nobody will be able to use for that purpose.
Con: Standards Process Abuse · Microsoft decided, rather than working to produce a harmonized standard by enhancing ODF to add MS-Office-specific features, to re-invent the world from scratch. This seems wrong.
ECMA, which claims to be a serious standards organization, blessed the process of generating a XML dump of the internal data format and publishing it in six thousand poorly-edited pages, in well under a year. This seems wrong.
ISO allowed ECMA to submit this on their fast-track process with breathtaking obliviousness to the existence of other standards and lack of concern for harmonization. This seems wrong.
ISO allowed the draft to be substantially edited and enhanced after the initial ballot. This seems wrong.
It tried to repair the damage by stuffing 120 people in a room in Geneva for five days to address a thousand changes to the spec. This seems wrong.
Thus, there’s an argument that this kind of process abuse shouldn’t be allowed to go unpunished. If Microsoft gets their standard, it’ll be a signal to other big players to try to do the same thing. If ISO gets away with doing this, it’ll have two negative effects. First, respect for International Standards in general will be diminished. Second, other people will start trying the same thing.
Conclusion · Well, my mind is still open. Locking Microsoft into a set of XML-based document-structure rules they have to play by (even if they wrote the rules), well, there’s probably an upside to that. But on the other hand, I dislike OOXML at an engineering level and I really dislike the cynical, abusive standards process it came with.
At the moment, it looks like we get the benefits (covenant not to sue, stable spec), without the downsides (Microsoft marketing club, rewarding ISO malfeasance) even if the ISO process fails.
What am I missing?