I see that Microsoft has posted a litigation covenant on the OfficeXML formats (also read Brian Jones’ exegesis). In response, there’s a bunch of legal poking and prodding here and here; I don’t understand the legal arguments, and I don’t think they’re the interesting part of the story anyhow. So, let’s do two thought experiments. First, what if Microsoft really is doing the right thing? Second, how can we avoid having two incompatible file formats? [Update: There’s been a lot of reaction to this piece, and I addressed some of those points here.]
Clearly, there’s still work to be done on the legals; the current Microsoft statement only covers the 2003 versions, and to see the schemas you have to fight through a bunch of technical barriers and draconian legalese. But unless Brian Jones is lying in his teeth, their intent is to remove the legal encumbrances and let developers loose on their file formats. (If that’s not the case, we’ll find out pretty soon, and in that case I think the Microsoft XML is pretty well doomed; the world is simply not willing to live with legally-tainted file formats any more). So, as a thought experiment, let’s assume the legal obstacles are gone; what then?
To keep things short, let’s call OpenDocument Format 1.0 “ODF” and the Office 12 XML File Formats “O12X”.
Alternatives · In ODF we have a format that’s already a stable OASIS standard and has multiple shipping implementations. In O12X we have a format that will become a stable ECMA standard with one shipping implementation sometime a year or two from now, depending on software-development and standards-process timetables. ODF is in the process of working its way through ISO, and O12X will apparently be sent down that road too, which should put ISO in an interesting situation.
On the technology side, the two formats are really more alike than they are different. But, there are differences: O12X’s design center, Microsoft has said repeatedly, is capturing the exact semantics of the billions of existing Microsoft Office documents. ODF’s design center is general-purpose reusability, and leveraging existing standards like SVG and MathML and so on.
Which do you like better? I know which one I’d pick. But I think we’re missing the point.
Why Are There Two? · Almost all office documents are just paragraphs of text, with some bold and some italics and some lists and some tables and some pictures. Almost all spreadsheets are numbers and labels, with some sums and averages and pivots and simple algebra. Almost all presentations are lists of bullet points with occasional pictures.
The capabilities of ODF and O12X are essentially identical for all this basic stuff. So why in the flaming hell does the world need two incompatible formats to express it? The answer, obviously, is, “it doesn’t”.
Microsoft wants there to be an office-document XML format that covers their billions of legacy documents, and they want it to be open. Fair enough; I approve. But why do we have to re-invent all the basic stuff, and have two ways to express “This paragraph is in 12-point Arial with 1.2em leading and ragged-right justification” or “K37 is the average of B37 through H37”?
The ideal outcome would be a common shared office-XML dialect for the basics—and it should be ODF (or a subset), since that’s been designed and debugged—then another extended vocabulary to support Microsoft features , whether they’re cool new whizzy features or mouldy old legacy features (XML Namespaces are designed to support exactly this kind of thing). That way, if you stayed with the basic stuff you’d never need to worry about software lock-in; the difference between portable and proprietary would be crystal-clear. And, for the basic stuff that everybody uses, there’d be only one set of tags.
This outcome is technically feasible. Who could possibly be against it?