ongoing by Tim Bray · What · Technology

What
· Technology
· · XML

xmlwf -k · What happened was, I needed a small improvement to Expat, probably the most widely-used XML parsing engine on the planet, so I coded it up and sent off a PR and it’s now in release 2.3.0. There’s nothing terribly interesting about the problem or the solution, but it certainly made me think about coding and tooling and so on. (Warning: Of zero interest to anyone who isn’t a professional programmer.) ...
[2 comments]

XML’s 15th Birthday · Whether you like XML or not, we’re stuck with it for a long time. These days, the only new XML-based projects being started up are document-centric and publishing-oriented. Thank goodness, because that’s a much better fit than all the WS-* and Java EE config puke and so on that has given those three letters a bad name among so many programmers. XML for your document database is actually pretty hard to improve on ...
[7 comments]

Wrong on the Internet · I was lying in bed this Sunday morning, checking the Net before coming downstairs to make scrambled eggs (with mushrooms and snap peas, yum) for the family, and ran across a bit of random snark from Aaron Swartz. Any Sunday morning is improved by a chance to argue about markup languages and how the Web works ...
[7 comments]

On “Custom XML” · I see that Microsoft lost an appeal in the “Custom XML” litigation, and may be forced to disable that functionality in Microsoft Office. This is a short backgrounder explaining what “Custom XML” is about, and why nobody should care ...
[27 comments]

Fixing XML · A week or two ago, I was reading something which included a really silly statement hyperlinked to the Wikipedia entry for XML. I followed the link and discovered that the entry was appallingly bad. I looked with a shudder at the size and complexity of the brokenness and just failed to convince myself that it was somebody else’s problem. So we fixed it ...
[9 comments]

XML in Oxford · That’s the XML Summer School in September at St. Edmund. I can’t make it, in part because my wife is co-ordinating which means I do child-care. I’ve been to these and they’re totally great, intense and interactive and focused; then you get to go drinking around Oxford in the evening. If you’re within reach and work with XML and want to upgrade yourself, I totally recommend it.

Neither Father nor Inventor · It’s nice when The Reg covers my speeches, but it raises an issue that I guess I’m going to have to address every year until I die. I am neither the father nor inventor of XML! ...
[9 comments]

Missing-Font Messages in Keynote · There are a variety of situations in which, when you start iWork tools, for example Keynote, you get a bunch of whining about missing fonts. This can be fixed by hand ...
[4 comments]

XML Trouble · Last week, I sent an email to one of the XML standardization lists at the W3C; my first presence in that conversation in quite some number of years. This short piece, of interest only to XML obsessives, gives a bit of background ...
[5 comments]

OOXML: Everything’s Just Fine · Or at least that’s what ISO’s Secretary General says. [I had hoped to stop writing about this subject, sigh]. There are multiple appeals against OOXML; let’s try to read the tea-leaves without too many guttural snickers ...
[9 comments]

RX and 1.9 and Pain · This fragment is mostly a note to myself and placeholder and might prove useful to someone slashing through the XML undergrowth with bleeding-edge Ruby. Briefly: I revived my “RX” Ruby tokenizer (see here, here, and here) to contribute to Antonio Cangiano’s proposed Ruby benchmark suite, which I think is a Really Good Idea. I had a bit of pain getting the code to run on both Ruby 1.8 and 1.9, and then when I tried sanity-checking the output by comparing it to REXML on 1.9, REXML blew chunks. There are, apparently, issues about REXML and 1.9. Read on for details in the unlikely event that you care about any of this ...
[3 comments]

ISO Fantasy · There has been much rejoicing recently at the process whereby, apparently, an ISO committee takes full control of OOXML. But you know, that story is entirely irrelevant. It will have no effect on what implementors of OOXML, including Microsoft, should or will actually do. The story’s ending will I think be mostly tawdry. Oh, and I have some OOXML news that I think is important, but that I don’t think anyone else has reported ...
[16 comments]

On OOXML · I hadn’t really planned to become well-informed about OOXML, but I have. So I thought I’d build my own personal list of reasons for and against OOXML becoming an ISO standard ...
[23 comments]

BRM Narrative · Now that the BRM is over, I feel I can write about it a bit more; there are some restrictions, but I’ll lay them out. Summary: A lot of good work was done, but the process is irretrievably broken ...
[16 comments]

OOXML Batch Converter? · Here’s a program that would be useful. You point it at a directory and it runs around finding all the .doc, .xls, and .ppt files, and generates an OOXML version of each. In the old days, you’d use VBA to do this kind of thing. I’m behind on this kind of technology, but I assume there’s something on Windows that would make this tractable?
[7 comments]

XML People · XML is ten years old today. It feels like yesterday, or a lifetime. I wrote this that year (1998). It’s really long ...
[23 comments]

Upcoming Gig: ISO OOXML BRM · I’ve been invited to join the Canadian delegation to the DIS 29500 Ballot Resolution Meeting in Geneva in February. This is a consequence of having joined the expert group supporting the Canadian National Standards Body; I haven’t quite figured out the forest of acronyms and organizations yet, or how things fit together. Given the white heat of politics and verbiage around this process, I’m going to accede to the request of a couple of Very Smart People who’ve asked me to hold off on real-time blogging. Which I’m comfy with, since I’m an ISO newbie and don’t know the process or the culture. I will say, though, that I am not representing Sun officially, the Canadian Standards people contacted me and I checked with our corporate Standards group and said that I wanted to go and would only go if I were free to offer my own technical opinions on technical issues; they were OK with that. I’ve been stuffing my brain with the OOXML comments and proposed resolutions, and the picture is interesting; I’ll write at length once I figure out how to do so without breaking anything.

Now That’s a Patch · I refer to Sam Ruby’s massive patch to make REXML work properly with the latest Ruby. I’ve long disliked REXML (see here and here), but it’s here and it works. Only the way it works changed in 1.9, and there were some horrible regressions, and it gets patched very slowly. (I’m actually wondering why Ruby needs to have a weird regex-based parser when Expat is plenty good enough for Perl and Python, and in fact if you look at xmlparser.rb, you can switch parsers, just as Nick Sieger has done for JRuby with JREXML. But I digress.) In the short term, we need to see if the REXML maintainers are responsive to Sam’s patch.
[1 comment]

Year-End Sweep — Tech · Over the course of the year, in browser tabs, bookmarks, and del.icio.us, I’ve built up a huge list of things that I felt I should write about, at least at the time I saw them. Well, dammit, I’m not gonna let 2007 end without at least making a try. Here goes. Categorized, even ...
[7 comments]

XBRL News · Last week I gave a talk at the 16th International XBRL Conference here in Vancouver. XBRL is an XML-based system for packing up companies’ financial information, and I think it’s real important. But its take-off has been kind of protracted and arduous. I was there as an Ambassador From the Web. Here’s a quick XBRL news overview ...
[5 comments]

Markup Thanks · I really didn’t pay that much attention to the first OOXML round at ISO, but I’ve developed a sort of sick fascination with it, leading up to the potentially-apocalyptic Geneva BRM. I read the Kyoto meeting report from the excellent Alex Brown and it dawns on me that a lot of us owe some huge debts to people whose names I bet most of you don’t know: James Mason, Martin Bryan, and Ken Holman. Here are a few words on them ...
[2 comments]

The OOXML News · I was really wrong about the OOXML/ISO story; told everyone “It’ll sail through ISO, don’t bother with the process.” Boy, was I wrong. At the moment that process is hurtling toward the mildly-historic “Ballot Resolution Meeting” in Geneva in February (read about it here and here). Anyhow, all those tens of thousands of comments on the first draft, which were previously invisible behind some ISO veil, are now out there for all to view, tag, hyperlink, annotate, and enhance, at the unofficial but excellent DIS29500 Comments site (tagline: “Help the OOXML BRM concentrate on issues of substance”). The person behind it seems to be Alan Bell, whom I don’t think I know, but the world owes him a vote of thanks. Obviously, this whole thing does retain a grimy side; see the excellent Martin Bryan’s fairly-despondent Report on WG1 activity for December 2007 Meeting of ISO/IEC JTC1/SC34/WG1 in Kyoto. Sigh. Nobody ever said history was clean.

Tab Sweep — Tech · This goes back weeks and weeks; I’ve been wide-finding and doing Sun stuff and the Web-watching has suffered ...
[6 comments]

Bad, Feed Readers, Bad! · Piles of junk, I say. Pardon me, but I’m feeling grumpy. After much more work than it should have been, mod_atom is now generating reasonably coherent (not done yet, but getting there) HTML output and human-oriented (as opposed to APP-oriented) Atom feeds. It’s slightly idiosyncratic XML, with lots of namespace prefixes. The Feed Validator says it’s OK and I think it’s OK. But none of NetNewsWire or Vienna or Bloglines [Update: or Blogbridge or Safari, or My Yahoo!, or Sage] can read it correctly. I fart in their general direction. [Update: Google Reader, Planet Venus, Snarfer, SimplePie, Liferea, Awasu, Shrook, and Flock get it right! Good on ya, guys.] [Ah, Brent sent me a pointer to the latest beta of NetNewsWire 3.1, and it’s fine. I know other people rave about GoogleReader and Vienna and so on, but for me NNW is still way ahead of the pack in letting me scan a whole lot of news in almost no time at all.] Are there any other feed-reader implementors out there who think they can, you know, read XML correctly?!?! If so, get in touch, and if you process my little bundle of joy properly, I’ll lavish praise and links. Or if there’s a bug in the feed that neither I nor the validator can see, I’ll apologize humbly to the whole world. In any case, I’m going to have to go back and patch up the code so it doesn’t emit any of those nasty colons and relative URI references that apparently hurt implementors’ fragile feelings. This does not improve my mood. [Update: Just to be clear, I’m not talking about the ongoing feed; if you want to test your feed reader, contact me and I’ll point you at the test feed.]
[17 comments]

Following the OOXML Story · From the beginning of the story up to last week’s crucial vote, it seemed that Andy Updegrove’s blog was the best place to follow this story. I think that as of now, the man to read would be Alex Brown; in particular, check out his authoritative OOXML ballot comments and OOXML - what just happened?. He’s also maintaining chunks of the moving-target OOXML Wikipedia entry. For an interesting sidelight, though, see David Berlind’s ZDNet reader: Don’t let OOXML vs. ODF shenanigans tarnish other standards setters; useful perspective, I’d say. And then for the real absolute unvarnished truth, check out Steve’s ISO document standards.
[1 comment]

ISO OOXML Craziness · I’ve generally been ignoring all the fuss & bother about OOXML’s well-greased path to ISO anointment. I’d assumed that after ECMA had applied rigorous and impartial scrutiny to all six thousand pages, ensuring that this was straightforwardly implementable by all interested parties, then the ISO rubber stamp wouldn’t be long in following, giving us an International Standard no less, plus fresh insight into the level of respect such things deserve; and we could all get on with life. Now, the ISO process seems to be turning into the most entertaining kind of standards mosh-pit, with loud accusations of corruption and malpractice. Canadians in the crowd will be reminded of the flavor of a Liberal Party nomination meeting. Groklaw’s coverage is predictably overamped, but still fun; here’s news from France, Sweden, and Norway. That’s just one day’s worth. [Update: Hey, Denmark too!] ...
[9 comments]

Tab Sweep — Tech · August is supposed to be the slow time of year. Not! Is there ever a lot of interesting stuff out there. Today we have WS-funnies, OOXML Purdah, Web names, Internet Registry structures, and Ruby metaprogramming craziness ...
[8 comments]

Tab Sweep — Tech · Today we have some Atomic Apple love, iPhone Web friendliness, RelaxNG praise, and JVM Language widening ...
[6 comments]

Color Commentary · Read the excellent play-by-play from Andy Updegrove: Update on the US Vote on OOXML (and What Happens Next). He seems to have all the public facts, but speaking as one who’s been through a few of those processes, I thought I should highlight something that’s going on right now, but won’t be talked about much. The problem of figuring out the US vote is in the hands of the 16 members of the INCITS committee. So does that mean that everyone’s sitting still waiting for them to make up their minds? Nope. What’s happening right now is that the big players with skin in the game are applying executive-to-executive pressure, behind the scenes, to the committee members’ bosses’ bosses’ bosses. In a few cases it’ll work, and the members will be issued here’s-your-vote marching orders. I’ve seen it happen. In fact, when the intensity level gets up there, I’ve never seen it not happen. Nobody will ever know the whole story on what’s happening right now under the covers. I really don’t envy the committee members.
[3 comments]

What XML Means · XML’s tenth birthday is coming up next spring; here’s my sound-bite on What It All Means. XML is the first successful instance of a data packaging system that is simultaneously (human) language-independent and (computer) system-independent. It’s the existence proof that such a thing can be built and be useful. Is it the best choice for every application? Is it the most efficient possible way to package up data? Is it the last packaging system we’ll ever need? Silly questions: no, no, and no. JSON is already a better choice for packaging up arrays and hashes and tuples. RNC is a better choice for writing schema languages. A classic Unix-flavor file containing ordinary lines of ordinary text is the best choice of all, whenever you can get away with it. XML’s still a decent option, probably the best, for interchanging things that are (at least in part) meant to be read by humans. It could be improved. It might be replaced. Wouldn’t surprise me, either way.
[12 comments]

Any Damn Fool · This is real news: James Clark has a blog, and in it he says “Any damn fool could produce a better data format than XML”. Um, James was designated Technical Lead of the original XML Working Group and is the single largest contributor to the design of XML. Also, perhaps, the finest computer programmer I’ve ever had the privilege of working with ...
[2 comments]

Tech Tab Sweep · I break with my no-underlying-theme theme and do an all-technology tab sweep; in fact, almost all XML ...
[8 comments]

XML 2.0? · Anne van Kesteren suggests an XML 2.0 mostly defined by less-Draconian error handling, provoking further discussion over chez Sam Ruby ...
[27 comments]

Life Is Complicated · My goodness, even CNN picked up the story about Microsoft trying to retain Rick Jelliffe to update the Wikipedia articles on ODF and OOXML for them, just as the ISO process around OOXML is getting in gear. This raises complicated issues about document formats and transparency and conflict of interest; and there’s at least one elephant in the room ...
[20 comments]

Tab Sweep · This is going to be big and have month-old news in it; a consequence of the long southern-hemisphere posting interruption. I’ll even group ’em into paragraphs ...

JSON and XML · I hear people saying “JSON is great, XML is over”, but I don’t hear XML partisans saying anything bad about JSON. There are two arguments that are over, though ...
[15 comments]

Microsoft XML, the Mac Angle · There’s been a lot of noise these last few days about the Microsoft Office XML file formats; the world doesn’t need my opinion again. I’d vaguely noted that Mac Office would be a little behind on the new XML, then Simon Phipps shot me links to a couple of closer looks, which shed an instructive light ...
[6 comments]

Choose RELAX Now · Elliotte Rusty Harold’s RELAX Wins may be a milestone in the life of XML. Everybody who actually touches the technology has known the truth for years, and it’s time to stop sweeping it under the rug. W3C XML Schemas (XSD) suck. They are hard to read, hard to write, hard to understand, have interoperability problems, and are unable to describe lots of things you want to do all the time in XML. Schemas based on Relax NG, also known as ISO Standard 19757, are easy to write, easy to read, are backed by a rigorous formalism for interoperability, and can describe immensely more different XML constructs. To Elliotte’s list of important XML applications that are RELAX-based, I’d add the Atom Syndication Format and, pretty soon now, the Atom Publishing Protocol. It’s a pity; when XSD came out people thought that since it came from the W3C, same as XML, it must be the way to go, and it got baked into a bunch of other technology before anyone really had a chance to think it over. So now lots of people say “Well, yeah, it sucks, but we’re stuck with it.” Wrong! The time has come to declare it a worthy but failed experiment, tear down the shaky towers with XSD in their foundation, and start using RELAX for all significant XML work. [Update: Piling-on are Don Park, Gabe Wachob, Mike Hostetler and some commenters. There’s thoughtful input from Dare Obasanjo, and now the comments have some push-back too. And oh my goodness gracious, a Rick Jelliffe must-read.]
[17 comments]

XML 2006 · We’re only three weeks away from XML 2006. Which brings to mind that it was ten years ago at that same conference (different name then) that we showed the world the first draft of the XML spec. It was a carefully-staged event, and one of the most intense 45 minutes in my life. Ah... Back to this year. Looks like David Megginson has put together a program that is notably free of the usual suspects and rich with new stuff. I see presenters from Google and the Motley Fool, and on PHP and (my goodness) JSON. Looks very good.

OOXML Hoo-Hah · Bob Sutor and Rob Weir (both of IBM) have been been whacking away at the standards lipstick being painted on the Microsoft Office Internal Data Structure XML Dump pig. Oops, officially, that’s “ECMA Office Open XML”. In A Leap Back Rob describes Excel’s well-known date-representation bug being encoded in an alleged International Standard. Then again in A bit about the bit with the bits, he talks about bitmasks and offal (really). But it’s Bob’s point, in Is Open XML a one way specification for most people?, that’s central: this is just a six-thousand-page data dump describing a particular XML serialization of a particular commercial application’s object model, completely oblivious to the universe of publishing-related standards that have been hammered out and put to work while MSOffice was being tended in Redmond. You can write “STANDARD” on it in letters as big as you want, but there will only ever be one full implementation, and if you standardize on this standard you’ve locked yourself in. Shame, shame on the other companies on the committee, helping Microsoft perpetuate this travesty. There’s just no excuse.
[3 comments]

Upcoming Gig: SEC · This has been booked for months, but I just found out that it’s an open-announcement thing. I’ll be participating in an Interactive Data Roundtable at the US Securities and Exchange Commission in Washington DC on Wednesday Oct. 3rd. The SEC’s Interactive Data initiative is, I think, going to be huge. As an investor, a businessman, and an open-data fan, I’m 100% sure that it’ll pay back the investment many times over.
[1 comment]

GullFOSS · I’ve never been 100% comfortable with this notion of a “group blog”, but I guess I should stop worrying. The Aquarium seems to have been a major success for the GlassFish people, and now there’s GullFOSS, OpenOffice.org’s home on the blogospheric range. As I write this, the latest post is their weekly development schedule snapshot, something that more Open-Source projects would do well to post. I may up doing a 180° turn and thinking that every substantial development project should have a group blog.

Making Markup Correctly · I’ve encountered three different Ruby libraries for generating markup: there’s one in the CGI library, there’s Builder, and there’s Markaby. To some degree, all are heavily informed by the special case of generating HTML; and maybe they’re OK for that. But if you want to go further and generate XML, they’re all pointing in the same, wrong, direction. Maybe I’m missing something, but I do have an alternative to offer. Plus, I find a chance to laugh at myself gleefully. [Update: Ouch! Refuted!] [Update: And again, more seriously.] ...

Johnson on Feeds · Dave Johnson gave a talk this morning at a local XML interest group. His slides (PDF) are the single best introduction and overview I’ve ever seen about feeds and syndication and RSS and Atom and all that stuff.

The DOM Song · Weirdly enough, having been around XML for so long, the last couple of days have marked my first exposure to actual DOM wrangling in code. This experience has driven many computer programmers to gloom, and even negative utterances. Not me! I even composed a song about it ...

Microsoft & ODF · I’ve been wondering how to react to this Microsoft ODF Announcement. Andy Updegrove points out that the news isn’t that new, but still I see this as significant. From a glass-half-empty point of view, I could object, as Bob Sutor does, to the misdirection and outright lies in the Microsoft spin. Or I could echo Mark Pilgrim in pointing out that this is currently largely vaporware (more details here). But I think that on balance the big story is that Redmond has moved from a “There’s no demand for ODF” stance to admitting that, in fact, there is. Currently, it’s largely a public-sector thing; and reading between the mellifluous lines of Chris Capossela’s A Foundation for the New World of Documents, I sense a tone of barely-suppressed fear: “We encourage public sector organizations to move to XML file formats but not to mandate a particular format or implementation.” We can all agree on implementation—that’s the point, after all—but to refuse to bless a format seems to me to ignore the lesson of the Web, written in letters of fire 500 feet high: agree on the smallest-possible number of data formats, and compete on what you do with them.

Roundup · Once again I’m drowning in little tech-news tidbits that I think the world needs to look at: hence a Friday linkfest: Item: John Cowan’s TagSoup has reached 1.0. This is going to be an essential tool for so many people. Item: Assaf Arkin, in Why Blogs Work, explains it all. Item: Kimbro Staken provides 10 things to change in your thinking when building REST XML Protocols. Item: InfoQ has launched; does the world need another software-news site? Quite possibly. Item: From Mark Nottingham, HInclude; this is pointing in the same direction as Ingy’s Jemplate, and unless I’m missing something obvious, it’s an important direction.

XML 2006 - Get Yer Papers In · Yow, just realized that the XML 2006 call for papers is upon us; for details see David Megginson, who’s the chair. Hanging with the XML tribe is always good fun, and I expect David to do a great job of running the shindig; so send him your good ideas.

New Neo · I’ve been kind of quiet, and that’s because the Java One people lowered the boom on me, told me that if I didn’t get the slides for my session in they were going to cancel it. So I’ve been spending quality time with Open Office, in particular the NeoOffice flavor. They’ve got an alpha of their version of OO.o 2 up, and it’s a vast improvement over 1.2, with a bunch of useful sidebar navigators and better view-switching. Also, it’s all-ODF. There’s some interesting business model innovation; although Neo is GPL’ed, you have to sign up and pay to join the Early Access program if you want to use the 2.0 alpha pre-release. I didn’t hit a single bug with the alpha in two days of hard editing; I assume the Neo boys are slaving away over performance, because it’s pretty slow at the moment.

More FUD · Andy Updegrove quotes a flurry of egregious Microsoft bullshit about ODF from Jason Matusow. In particular: “The ODF format is limited to the features and performance of OpenOffice and StarOffice and would not satisfy most of our Microsoft Office customers today.” In your dreams, Jason. [Update: What Andy Updegrove said.]

SAML On The March · I tell people I’m a software generalist, but there are lots of holes in my knowledge. One of them is identity and I really must fix that, because it’s a hot pain point both for businesses and individual people. (How many passwords do you have?) Anyhow, our own Eve Maler is one of the people you want to watch in this space, and she’s pointing us at a bunch of action over in SAML-land, here, here, and here. For my money, the hot story is the Danish requirement that if you want to do federation, you should bloody well use SAML. The Danes have had positive experiences with shared standardized XML vocabularies, having scored a big win with UBL. I can’t imagine anything in the short term that would be of greater benefit for everyone than ubiquitous shareable identity services.

XML Automaton · In December of 1996 I released a piece of software called Lark, which was the world’s first XML Processor (as the term is defined in the XML Specification). It was successful, but I stopped maintaining it in 1998 because lots of other smart people, and some big companies like Microsoft, were shipping perfectly good processors. I never quite open-sourced it, holding back one clever bit in the moronic idea that I could make money out of Lark somehow. The magic sauce is a finite state machine that can be used to parse XML 1.0. Recently, someone out there needed one of those, so I thought I’d publish it, with some commentary on Lark’s construction and an amusing anecdote about the name. I doubt there are more than twelve people on the planet who care about this kind of parsing arcana. [Rick Jelliffe has upgraded the machine]. ...

Jon and the Minotaur · Jon Bosak (father of XML, terrific photographer, good person, etc.) was in Vancouver for some meetings having to do with UBL (and be warned, there’s going to be some more UBL tub-thumping around here), and encountered a monster ...

Dr. Macro · That would be the handle of Eliot Kimber, a member of the original XML Working Group. I count myself among the more prolific and pedantic members of the markup community, but Eliot sets A Higher Standard; indeed, those who know him find his entry to the blogosphere long overdue. His tagline: “All tools suck.” He has recently published a rant preview which may help you decide whether you want to subscribe, as I have.

On XML Language Design · If you’re going to be designing a new XML language, first of all, consider not doing it. But if you really have to, this piece discusses the problems you’re apt to face and offers some advice on improving your chances of success ...

Don’t Invent XML Languages · The X in XML stands for “Extensible”; one big selling point is that you can invent your own XML languages to help you solve your own problems. But I’ve become convinced, over the last couple of years, that you shouldn’t. Unless you really have to. This piece explains why. And, there’s a companion piece entitled On XML Language Design, in case you do really have to ...

Upcoming Gig: NYS CTG, Albany · Those acronyms stand for “New York State Center for Technology in Government”, which is under the umbrella of the University at Albany (part of SUNY). On January 25th in Albany they’re having a conclave entitled Thinking Beyond Your Web Site: Lessons from the XML Testbed Project, and I’m really looking forward to being a part of it. Obviously, the intersection of XML and State Government is very, very interesting territory at this moment in history. I’ll try to evangelize of course, but it’s more important that I do some listening; I think most of us in the industry don’t have a good enough understanding of the issues the end-users down in the trenches are facing.

Swiss Bank Account · I’ve been casting around trying to find something to write about the ECMA rubber-stamp Microsoft is buying for their Office file formats but have been unable to rise much above “blecch”. Simon Phipps, in saluting IBM’s wise refusal to play the game, manages to bring some grace and even a little humor to bear.

Drop the <!DOCTYPE> · Back when we cooked up XML in 1996-97, there were good reasons to have that ugly upper-case gibberish at the top of your XML documents. That was almost ten years ago; now it’s time to do away with it, and also time to have a spec for Doctype-free XML ...

John Cowan · He’s a legend in the XML community, is the author of TagSoup, is ridiculously erudite on any number of things, and is looking for a new job. I think he’d be a good bet.

RFC 4287 · I hadn’t seen the announcement, but this looks like a stable official IETF link to RFC 4287, The Atom Syndication Format. A little more work and we’ll have the publishing protocol done and I can return to my plow (or equivalent). The work of the WG and editors was just outstanding, and the IETF did, as advertised, provide a useful quality-control process without unduly getting in the way. Thanks everyone. The world now has a general-purpose syndication format that is small, stable, based on the last decade’s lessons, clean, and widely implemented. I feel happy.

Catcalls · It seems like my little thought experiment has touched a nerve. Scoble, Dare Obasanjo, and Randy Holloway all push back, amazingly enough all making the same argument: how can I be against duplication in office-document XML format while at the same time being mixed up in the Atom Project? The argument is fallacious, but at least Robert and Randy made it in grown-up, polite terms, leaving the childish name-calling to Dare. Now, as for RSS and Atom: When I came on the scene in 2003, RSS was already hopelessly fragmented, and there was exactly zero chance of any of the large-egoed thin-skinned proponents of the various versions deciding to make nice with each other. Atom is precisely an attempt to reduce the number of vocabularies that implementors feel they have to support. Turning to the office-document space: right now the world has exactly one finished, delivered, standardized, totally-unencumbered, multiply-implemented XML-based office document format. You are the guys who want to introduce another, incompatible one. And I think that’s OK; but restrict your invention to the specialized Microsoft stuff that ODF can’t do, and don’t re-invent the basics. Why is this controversial?

The Saga Continues · The Massachusetts Office XML File Formats saga, that is. The latest news is that the Microsoft announcements last week are playing well in Boston. Commonwealth secretary Thomas Trimarco stated “we are optimistic that Office Open XML will meet our new standards”, and I’m optimistic too. Obviously the key word is “will”, since we haven’t seen what’s getting submitted to ECMA and nobody’s seen what will come out of ECMA. Our own chief standards geek Carl Cargill wrote Mr. Trimarco a letter, which you can read over at Piper Cole’s weblog.

Thought Experiments · I see that Microsoft has posted a litigation covenant on the OfficeXML formats (also read Brian Jones’ exegesis). In response, there’s a bunch of legal poking and prodding here and here; I don’t understand the legal arguments, and I don’t think they’re the interesting part of the story anyhow. So, let’s do two thought experiments. First, what if Microsoft really is doing the right thing? Second, how can we avoid having two incompatible file formats? [Update: There’s been a lot of reaction to this piece, and I addressed some of those points here.] ...

Microsoft XML News · The newswires are buzzing today with Microsoft XML action. So, what do you want from an XML-based standard, whether it’s about synchronization or spreadsheets? First, you want it to be stable. Second, you want it to be legally unencumbered, so anyone can use it in their software. These things are really essential. Less essential, but important: you’d like it to have community involvement, some sort of open process; and finally, you’d like it to be, you know, technically good. So let’s look at today’s headliners, SSE and MSFT Office XML. Stable? SSE at the moment is just something Ozzie and Winer are kicking around, but who knows? As for OfficeXML, yup, this move to ECMA/ISO will make it stable. Unencumbered? SSE’s Creative-Commons license looks pretty good to me. Today, Jean Paoli told Scoble that they’d be doing some sort of “covenant not to sue” over OfficeXML. This would be great news, and we hope that, unlike the current license, it’s GPL-friendly. This is real important, because neither ECMA nor ISO have problems with standardizing heavily-encumbered technology. Open, transparent processes? Well, er, not exactly a Microsoft strength. I honestly don’t know whether ECMA will provide for meaningful input, or whether the process’ outcome, as for example OASIS allows, is completely predetermined. You have to admire the chutzpah in pre-announcing that the ECMA and ISO processes will finish before Office 12 ships, if only by minutes, especially since one assumes that the idea is that Office 12 is going to comply with those standards. Remarkable process-management and software development skills are evidently involved. Finally, are these technologies actually any good? As for SSE, I don’t know a thing about synchronization and Ray Ozzie knows lots, so I’ll hold my peace. On the OfficeXML side I have lots of opinions, but the opinion that’ll matter is that of ISO JTC1 (I’d guess more specifically SC34), which will soon be dealing with two attempts to standardize a solution to the same problem. Should be fun to watch. Oh yes, and since we’re talking about standards, would MSDN please get a clue!?!?.

Bosworth in ACM · I recommend that everyone go read Learning From the Web, a substantial essay by Adam Bosworth, in the latest ACM Queue. It doesn’t say anything new that Adam hasn’t been telling everyone for the last couple of years, but it’s nice to have a canonical version of his message written down somewhere, for the world to point to and learn from.

Boston ODF Day · I spent Thursday the 27th in Boston; I was invited by Harvard Law School’s Berkman Center to participate in a round-table discussion of interoperability and standards in general, and the current Massachusetts/ODF brouhaha in particular. It was interesting and instructive, and I recommend that anyone who cares about this check out the recording. To help understand the context, there is this guy in the room from ACT who was pushing back pretty hard against the new Massachusetts policy. His arguments are lifted pretty well word for word from the Microsoft talking points, which was useful as the event might otherwise have been a love-in.

XML 2005 · I just spent some quality time with NXML-mode and a DocBook-derived tag-set (if the whole world would learn Emacs, the problems around XML editing would dry right up), pulling together my paper, On Language Creation, for the XML 2005 conference, next month in Atlanta. Can’t wait to hang with my tribe; be there or be ❏.

Rick Jelliffe · He’s been working on XML since before it was invented, he knows approximately everything about XML and publishing technology, he invented Schematron (which you should be using if you need to validate XML in a complex or subtle way), he’s a nice guy, and he’s looking for a job. Go get him.

OpenOffice Mac Sanity · A week ago, in my OpenOffice.org conference report, I wrote that the X11 Mac Port was being abandoned in favor of a Cocoa version. Every bloody Mac site in the world picked this up as though it were a major news story, and now I hear from Patrick Luby, chief maintainer of NeoOffice/J, that as a result, the people who’ve been supporting his work are threatening to cut him loose. This is madness; at the moment, Neo/J is the only actual shipping version of OpenOffice that you can run on a Mac with the menus in the right place, with drag-&-drop and fonts that Just Work, and so on. This is going to remain the case for some time, because the task of switching over the current X11 version is going to be huge, slow, and high-risk. (Patrick was also mad because I said Neo/J was “behind”, and, without going into details of Java and OO.o versions, he’s got a point). So for the time being, I’m going to go on using and supporting and probably blogging about Neo/J, because that’s all there is. And I still think that Apple should take an interest in this work.

Some OO.oCon Lessons · Yeah, at the conference there were speeches and press briefings and so on, but the main thing was all the good stuff there to be learned, some of which is related here. Plus a rare live photo of a slashdotting experience from the inside. [Update: They fixed the video.] ...

New England Town Meeting · On the 16th of this month, the Massachusetts Technology Leadership Council hosted a meeting at which Eric Kriss, the state’s Secretary for Administration and Finance, and Peter Quinn, the CIO, discussed the state’s recent proposal to standardize on the Open Document Format. I received a set of meeting notes, which I reproduce almost as-is (spell-checked, removed personal names and editorializing). They represent one attendee’s informal capture of the proceedings and have no official standing. But there is some eye-opening stuff here. [Update: via David Berlind, there’s online audio of the meeting.] [Update: Aha! Bob Sutor reports that the Massachusetts decision is now final. This is just the beginning of a long, long, road, and you know what? Microsoft is too smart not to go down it; the only question is when they start. See also Sam Ruby on Brays, Fairness and Doublespeak.] ...

Apple File Formats · The whole world has been giving Microsoft a hard time over their Office XML file formats; it turns out that there are far worse sinners. Apple, for one. Derek Beatty here at Sun ran across this write-up on their iWork (Keynote, Pages, and so on) file formats, which are XML-based. Item: there’s no attempt to conform to OpenDocument or any other standard. Item: they change them at will: “With the introduction of Keynote 2.x, this schema file is out of date.” Item: They don’t exactly encourage using their specs to build software: “Although the information in this technical note may appear useful, you should not rely on it for developing or modifying your own products.” And, to cap it all: “This document does not describe the complete XML schema for either Pages 1.x or Keynote 2.x. The complete XML schema for both applications is not available and will not be made public.” [Emphasis Apple’s.] Charming stuff. [Update: Apple’s Ernie Prabhakar pushes back passionately. I still don’t think anyone should store information that matters in a data format that’s not open and documented, but Ernie makes some good points.] [Update: Ooh! My own genuine Apple Leak, on how the iWork XML got that way. Read on.] ...

Got XML News? · Interesting times in the world of XML. Now, Atlanta in November may not be Paris in the spring, but it’s going to be XML World Headquarters Nov. 14-18, and the deadline for late-breaking-news papers is this Friday the 16th, so if you’ve got a story to tell, now’s the time.

Massachusetts Back-Room · The comment period for the new draft Massachusetts office-file-format policy ended last Friday the 9th. During the week before that date, there was some pretty intense back-room politics going on. There are a ton of industry associations and lobbying groups, including: Mass Software Council, Technet New England, Mass High Tech Council, Mass Network Communications Council, Associated Industries of Massachusetts, and AeA. You can bet that every one of them was coming under pressure last week to speak up pro or contra the state’s position. Since you have IBM and Sun on one side of this issue and Microsoft on the other, you can also bet that they were getting pulled both ways. I’m pretty sure that a lot of them ended up with a statement along the lines of “On the subject of the new draft from the Commonwealth of Massachusetts, we’re in favor of motherhood and apple pie.” But, I got my hands on a copy of the other side’s talking points, and I think they make interesting reading. [Update: I hear unofficially from someone at Adobe corporate that they’re “generally happy with how things went”, so I was wrong, sorry. Fixed.] ...

Scott to Massachusetts · Since Scott McNealy doesn’t have his own blog, I’ll post the email he sent to the Commonwealth of Massachusetts today ...

Massachusetts XML · This Massachusetts-office-file-format story has legs, it’s still echoing around a week after it broke. Oddly, there’s been relatively little coverage of the “this is a good move because...” form, so: This is a really smart move by Massachusetts... Because this way, they maximize the chances that the data is re-usable by lots of different programs, and not just office suites. Because they are entirely 100% free of legal entanglements. Because they maximize the chances that the data will still be usable by their grand-children, independent of the fortunes of any software company. Because if there’s something that needs adding to the format, there’s a standards committee whose job that is. I’m going to close by quoting, once again, a paragraph from a letter that the European Commission sent to Sun last year, that I think says what needs to be said: Transparency and accessibility requirements dictate that public information and government transactions avoid depending on technologies that imply or impose a specific product or platform on businesses or citizens. Amen.

Republished · At some point in the transition to Debian Sarge, something broke in the the ongoing software. The perl code reads text using an XML processor and various pieces of it get stashed in a Mysql database. Only somewhere along the line, non-ASCII UTF-8 characters were getting trashed. I tried all sorts of stupid dodges, and was whining away at Sam Ruby via instant messenger, and he said “of course, you could do it all as seven-bit ASCII via &#xBabe;... or you could rewrite it in Ruby and It Would Be Much Better”. I shrieked “Get thee behind me foul tempter!” and have now jammed everything into 7-bit ASCII as it comes out of the XML parser, and of course all the problems have gone away. Actually, the code got simpler, lots of XML escaping/unescaping calls are no longer necessary. This is one of the nice things about XML I guess, it allows you to be a good internationalization citizen even when your software infrastructure isn’t. It still feels evil. Anyhow, the whole site’s been republished, let me know if anything’s busted. (By the way, if you’re reading this in my RSS feed and all the entries show up as new, switch to the Atom feed and that problem will go away, because Atom actually has unique IDs and datestamps that work.) [Updated: Tony Coates (interesting new blog there, BTW) reports that Opera 8.02 gets it backwards, which means that it’s one of the rare pieces of software that respects guids in RSS, but that it’s doing Atom 1.0 wrong.]

Massachusetts Ripples · While I was pondering what to write about this OpenDocument story, it spilled all over the Internet and generated oceans of coverage (thanks to Bob Sutor for the link round-up). I wonder if Gov. Romney has heard from Steve or Bill yet? To my eye, perhaps the best last word is this leader from ZDNet UK.

See You in Slovenia · I’m going to be doing a keynote at the next OpenOffice.org conference on September 29th, in Koper-Capodistria. I love what OpenOffice is trying to do, and really looking forward to my first visit to Slovenia. Also, it’ll be a chance to do a speech that’s (mostly) not about blogging or syndication. Hope to see you there.

Summer School · I’ll be spending the week in Oxford, participating in the CSW XML Summer School, held at Wadham College. Ostensibly I’m here to lecture, but my real objective is to do a quick catch-up on what the XML application space looks like in A.D. 2005. My only real gripe is that my session is scheduled opposite XML in Healthcare, which I’d really like to attend. Oxford is ridiculously photogenic, I’ve included a couple of snaps of Wadham ...

XML and Religion · I suspect that most people who read me also read Adam Bosworth. But if you don’t, do.

New Office XML · The popular wisdom is that it takes Microsoft until Release 3 of anything to get it right; but the early word on the new Office XML format makes Release 2 look pretty good. Reading between the lines, the big news is, first, that the default file-save format is XML and, second, that the XML coverage is complete (In the current Office XML, PowerPoint is entirely absent and Excel has big holes). Assuming Microsoft pulls this off, it’s a major achievement. Along with patching those holes, working around the basic OLE-container-ness of everything has to be tricky; one of the nice things about MS Office is that you can jam pretty well anything that talks OLE into the middle of pretty well any Office doc and it just works. I have questions around the licensing: Brian Jones, linked above, says “royalty-free” but the current licensing language has some clauses that make lawyers nervous, so let’s wait and see on that one. At one level, it’s sad that while the rest of the world (including, lately, Adobe and IBM) has been hard at work on one wide-open, shareable, portable, standardized XML office document format, Microsoft put their energy into inventing another one. Still, this ought to be a step forward for Microsoft’s customers. The news coverage says “late 2006”; good luck to the team in the tough job of getting it shipped.

OpenDocument! · On Monday there was what seems to me like a major news story: the announcement that OpenDocument 1.0 has been approved as an OASIS Standard. As I’ve said before, OpenDocument is almost exactly what we had in mind when we built XML, starting back in 1996. Right now, it is the only XML office document format that is standardized, and it is also the only one that is complete; Microsoft’s offering is full of holes, starting with the absence of PowerPoint. It’s also completely 100% free of intellectual-property issues, anyone can use it for anything anytime anywhere without asking anyone first. Let me put it this way: if you occasionally create documents or spreadsheets or presentations, and if you think that you’d like to own them, independent of your Office software vendor, well, you have exactly one choice: OpenDocument. If those docs/spreadsheets/presos might be long-lived, or contain high-value data that you might want to re-use later, and you don’t use OpenDocument, well there’s a word for that but I’m not going to put it up on the front page at ongoing. By the way, at the request of our friends in the European Commission, we’ve committed to getting behind making OpenDocument an ISO Standard, too.

Not An April Fool’s Joke · Just foolishness. The XML Binary Characterization Working Group has issued their final report which recommends (surprise, surprise) that the W3C produce a “Binary XML” specification. Elliotte Rusty Harold nails it. I don’t care if anyone wants to go off and produce their own data interchange format, binary or not, open or not, standardized or not, mapped to XML or not; as long as they don’t call it XML. “Binary XML” is an oxymoron. And I should point out that the people at Sun who are building a binary data format with a mapping to XML are calling it something else entirely. These Binary-XML people are charging headlong onto the top of a very long, very steep, very slippery slope. [Update: Further joy. I see that this poorly-labelled table asserts that XML prevents both “processing efficiency” and “forward compatibility”. Glad to hear it.]

Not An April Fool’s Joke · Norm Walsh has a densely-technical post showing a nasty problem that’s cropped up in the interaction between XInclude, xml:base, and XML validation. Unless you’re a serious XML geek you probably don’t want to wade through the details, but in his conclusion, Norm raises a startling point: “I think what pains me most about this situation is that XInclude was in development for just over five years. It went through eleven drafts including three Candidate Recommendations. Why didn’t we notice this until several months after XInclude was a Recommendation? I’ll grant that XInclude is a fairly odd specification, in the sense that it’s providing functionality that you’d expect to occur down in the parser (like entities), but it’s only 8,563 words long. If we can’t get a 16 page spec right in three CRs, what hope do we have of getting the XSL/XML Query family of specifications right? By the same metric I used on XInclude, I get just over a half million words (505,779) in those documents. ” Half a million words... pretty scary.

Megginson Linkage · Another one of the key people around the birth of XML has joined the conversation. This time it’s Dave Megginson, who’s best known as the chief designer of SAX, but has made contributions large and small all over the universe of descriptive markup. Interestingly, one of his first entries calls for a key simplification to XLink, one of the best XML ideas never to have hit the big-time. Within an hour of reading David’s suggestion, which is of course excellent, I ran across Norm Walsh holding forth on the same subject; apparently, chances are XLink will become more lightweight, which would be A Good Thing and might change the world, slightly.

Fast (They Say) and Open · There are a lot of people at Sun who are convinced that some sort of binary XML representation is a good idea. I’ve never been convinced, but they’re serious; they’ve drafted a proposal and are working on getting it standardized; informally it’s called the “Fast Infoset” and officially it’s “ITU-T Rec. X.891 | ISO/IEC 24824-1”. I’ve been particularly dubious because it’s built on ASN.1, which I’ve had bad experiences with. But those mostly had to do with broken or unavailable software, and that objection may be moot, because as Eduardo Pelegri-Llopart writes, they’re shipping an Open Source implementation. Eduardo also tells me they’re getting lots of interest from outside of Sun. Hey, as long as whenever someone tells me “I interchange XML” that means they’re willing to interchange streams of Unicode characters with angle brackets, I’m OK.

Office Doc Format News · A couple of low-key news items in the Office Document XML space, worth highlighting because I think this area is significant, as do some important people. First off, the people standardizing this stuff over at OASIS (and soon, ISO) published a second draft, and, without any fanfare, they changed the name from “OpenOffice.org XML Format” to “OpenDocument”, which is shorter, better, and not tied to any particular implementation. There’s action on the Microsoft front too; check the microsoft.public.office.xml and microsoft.public.xml newsgroups, where there are flurries of questions digging through the knotty corners of WordML and ExcelML; you never really understand a dialect until you have to write a program to generate it. I’m sure the details will come out despite some current irritation, but this is a reason why Microsoft should cast a friendly eye at the boring, bureaucratic, painful standardization process.

XTech, Time to Pitch In · This is the big European summer XML event; the call for papers is closing in a couple days. Edd Dumbill, the chair, tells me that the submissions are excellent on the server side but there’s room on the client side, so all you XULoids and .NET-hacks and SVGers and XFormlings, get with it. Lauren and I are going to try to go (haven’t been to Amsterdam in ages); I’m particularly interested by the Open Data track, which sounds like something new in the world.

UBL by the Numbers · Via Jon Bosak, a pointer to this XML 2004 presentation (PowerPoint, sigh), about the Danish Government’s deployment of a bunch of XML technologies including UBL. Check out slides 4 & 5: they estimate the annual savings achievable from invoicing in UBL at somewhere between €100M and €160M. I may be out of step with the crowd but it seems painfully obvious to me that UBL is going to be huge and I don’t understand why more technology vendors (including my employer) aren’t refocusing their e-business strategy around it.

Ms Maler · Hey, Eve’s here! I expect great things. What she doesn’t highlight in her basic bio is that she helped invent XML, and has great hair, and is funny. Oh, and dig her URL. Now she’s a WordPress geek too; seems to be a growing tribe.

xfy · The full name is xfy Technology; it’s one of the most interesting pieces of new XML software I’ve seen in a long time. On the surface it’s an editing system, but the world has lots of those. There are three things that are interesting here. First of all, it’s from Justsystem, a Japanese software vendor which has gone toe-to-toe with Microsoft for a decade, carving out their fair share (and then some) of the office-suite market; so they should be taken seriously. Second, it’s got the slickest SVG-editing demo I’ve ever seen, you stretch shapes and watch the XML source code change, or vice versa. Finally, they told us it was all-Java and I was watching the demo and I was really impressed at the snappy, attractive UI, but then I got puzzled and said “I thought you were using Swing but, uh, what’s that?” “Well,” the soft-spoken young Japanese engineer allowed, “we are, but then we created some custom controls.” It just may be that the world headquarters of Java UI innovation is currently on the other side of the Pacific.

More Relax · I often caution people against relying too heavily on schema validation. “After all,” I say, “there is lots of obvious run-time checking that schemas can’t do, for example, verifying a part number.” It turns out I was wrong; with a little extra work, you can wire in part-number validation—or pretty well anything else—to RelaxNG. Elliotte Rusty Harold explains how. Further evidence, if any were required, that RelaxNG is the world’s best schema language, and that anyone who who’s using XML but not RelaxNG should be nervous.

Extending Ruby · There’s a nice article by Garrett Rooney over at O’Reilly’s ONLamp site. I’d already mentioned Garrett’s Ruby wrapper for Genx; in this piece, he uses it as a case study on how to extend Ruby with C code. Neat.

Three Questions on XSD and WSDL · Last week at the Colorado Software Summit, during my keynote I asked three questions of the attendees, who were a few hundred mostly senior developers, mostly from the Java ecosystem. (I’ve tucked a picture in the body of this piece.) Do you use XML Schema? Pretty well every hand went up. Do you think you understand XML Schema? One hand went up. Do you like XML Schema? A scattering of hands, maybe 20%. I asked the same three questions about WSDL; similar pattern, not quite as universal exposure, a few more thought they understood it. Just reporting ...

The Fifth XML Developers’ Conference · I spent Wednesday and Thursday at Chris Sells’ fifth XML Devcon. This is a high-level gathering of the .NET XML (and thus Web-Services) community. It’s being blogged to the max (nicely aggregated on the conference site), and there’s an eWeek person here journalizing in realtime. It’s been fun and educational ...

Applied XML · I’ll be at Chris Sells’ XML thing near Portland this week, combining the physical risk of being near an active volcano with the moral peril of being surrounded by WS-* evangelists from Redmond who think that the natural lifespan of an XML document is measured in microseconds. I’m cranked, have actually been lying awake at night thinking of things I’d like to say to this gang and wondering if they can or should be said. I’m often guilty of arriving at a conference in time to speak and being on the next plane out, but I’ll take in most of this one. It’s a way to discover a major continent on planet XML that, for me, has remained largely unexplored. Current working title of my talk: Bits on the Wire: Lessons From the Syndication Explosion.

Smart EC · Because of the way ongoing works I need fairly short headlines, which is a pity, because for this piece I wanted to use The European Commission Makes Extremely Smart Moves Concerning Open XML-Based Office Document Formats and Browbeats Vendors Deftly; As a Result the Open Office XML Format Will Probably Become an ISO Standard ...

Genx Status · This is the permanent status page for Genx (tarball · docs). Genx is a library, written in the C language, for generating XML. Its goals are high performance, a simple and intuitive API, and output that is guaranteed to be well-formed; the output is also guaranteed to be Canonical XML, suitable for use with digital-signature technology. There is a Python wrapper. Genx comes with a GPL-Compatible but non-viral Open-Source license. Latest news: In production, carrying hundreds of thousands of subtitles per day; thinking of taking off the “beta” stamp ...

On Custom Schemas · Not so long ago, I wrote a piece about open document formats. Just today there was an interesting (as always) follow-up from Jon Udell, but what I wanted to address here is Dare Obasanjo’s take, which is pretty well the Microsoft party line (not that Dare’s always a party-line guy): the Office software and its document formats are winners because they allow the use of custom schemas for office documents. That’s more important, they say, than the dodgy licensing terms and the missing pieces. I used to believe that custom schemas for office documents were generally a good idea, but I no longer do. Here’s why ...

Another Year’s Harvest · There’s a general air of gathering expectation around the house; Lauren chairs the big annual XML conference and this weekend is the closing date for paper submissions. She keeps vanishing behind the nearest computer to check the inbox, and coming back with a smile. Then the assembly-line takes over: reviewer assignments, reviewer chivvying, paper selection, and then it’s getting into logistics time. If you’ve never been to one of these, you should go; you can get deeper into the deep XML issues in a couple hours in the hotel bar than in a week of seminars anywhere else. And if you’re doing something red-hot and exciting with XML that the world needs to hear about, it’s (barely) not too late to put virtual pen to virtual paper.

OpenOffice · I spent the day Thursday at StarOffice in Hamburg and came away with some of my ideas about XML & blogging changed. It was a side-trip; other business took me to Brussels and OpenOffice wasn’t that far away and I had an agenda there, which we’ll get to. But this is important stuff, I think. [Updated with a pointer.] [And again with Geof Glass’ OO.o-to-blog gateway.] ...

TCP is So Over · Most of us have been hearing rumors about this skunkworks XCP thing for some time, but now they seem to be open to the public. As they say, “Light the Fiber!” Think about it this way: I first went to the mat with TCP/IP in 1984, when 4.2bsd hit the streets. A twenty-year run is plenty for most technologies, and I’d say TCP has pretty well had its time in the sun.

<’s Pointy End · Dave Walker over at freeform goodness catches me with my XML pants, figuratively speaking, down. I wrote a piece about leaving the W3C TAG entitled (cleverly I thought) </TAG>. Unfortunately that < in the title caused all sorts of grief and breakage, both here at ongoing and downstream in the world of syndication and aggregation. I can fix my own problems, but it’s deeper downstream; long term, the answer is Atom. Herewith some thoughts on good programming practices and the larger problem. [Update: A couple of notes on the “href problem.”] ...

Office Markup Languages · I’ve been at Sun less than a day and this guy gets in touch. Guy: “You’re the XML expert, right?” Tim: “Well, er.” Guy: “Open Office’s XML is better than Microsoft Office XML, right?” Tim: “Uh, I don’t know.” ...

Genx Alpha · I just posted a Genx tarball; the documentation is separately available here. This is Alpha code, not because it’s all that buggy (it doesn’t do that much, after all) but because it’ll quite likely change once some other smart people see the problems I haven’t. There are quite a few departures from the designs I posted earlier and where the ensuing discussion got to, simply because I’ve now written the code; and I’m never smart enough to understand the problem until I’ve written the code. For those who care about such things, discussion will probably be mostly on the XML-dev mailing list. Genx currently has an ultra-minimal copyright statement but I plan to adopt the latest rev of the Apache copyright before I do another release. [Updated: Oops, tarball was mis-placed; it’s there now.]

Writing Genx · In between beach time and rainforest time, I’ve been coding away on genx; herewith some impressions with one important lesson and an interesting bit of history ...

Reflexive Naming · I’m working right now on the design of an XML vocabulary with an element whose name is attribute which has an attribute whose name is name. It makes it hard to talk about the XPaths without a lot of stuttering.

Genx · It seems there’s some considerable demand for a C-callable API which will write XML safely and efficiently. I sketched out an interface design which you may peruse here; I think it’ll be pretty self-evident to the C-literate. It compiles and I wrote and tested the genxScanUTF8() method, so it’s not entirely vapor. Upon consideration, I think it will be virtually no extra work to make it emit Canonical XML, ready to be signed, sealed and delivered (and Rich Salz said he would help) so why not? Major thanks to Anthony J. Starks for the name—I am not a member of Gen X myself, but I do share a city with Coupland, so there you go. Since ongoing doesn’t have comments, I’ll post a pointer to this item over in the xml-dev mailing list, which is a natural place to discuss it. It would be very surprising if this first-cut sketch didn’t contain some stupid errors, so go get ’em.

On Writing XML · In a recent essay I offered, given demand, to author some XML-writing software. There’s been quite a bit of feedback, and the consensus seems to be that the Java community is fairly well-served with XML writing software, but that this would be real useful at the C level. So that’ll be my coding fun for the month of February. The rest of this essay lists some of the Java options that people told me about, and introduces some issues around the C implementation ...

History of XML Error Handling · I encourage everyone to go and read Mark Pilgrim’s remarkable overview of the history of XML error-handling. His summary is In the end, Tim basically said “there are two camps here, they both have good points, we aren’t going to convince each other on this one” and then proceeded to compromise by doing it his way. Mark’s selection of out-takes from the debate would seem to support that narrative. Excuse me while I go off in a corner and shake off the megalomania. Let’s get real: even my Mom wouldn’t believe that I could single-handedly impose so fundamental a policy decision on this large and passionate a community by saying “Make it so.” What happened was, we had a really big, really long, really passionate argument on the subject; the camps came to be called “Draconians” and “Tolerants.” After this had gone on for some weeks and some hundreds of emails, we took a vote and the Draconians won 7-4. And indeed, some among the Tolerants cried foul over that vote. This was a good example of what we mean when we say “rough consensus” in that even those on the short side of the vote were willing to defend the process and the outcome; see Hollander and Sperberg-McQueen. Other interesting glimpses into this history may be found here and, giving the last word, as is appropriate, to Jon Bosak, here.

The Three-Legged Future · There’s a real interesting note from Campbell and Swigart lamenting the fact that, down in the coding trenches, the worlds of objects and of RDBMSes and of XML are far from unified, and that attempts in that direction have been less than enthralling. I think we just have to get used to it, and from here on in, the practice of software engineering is a three-legged discipline ...

Deep XML · At the recent XML conference, Norm Walsh hosted a nocturne on Practical RDF, the highlight of which was his tour through the norman.walsh.name setup. From the outside you may think this is a mere blog, but it’s actually a side-effect of a frighteningly gnarly confluence of metadata streams which are shaken and stirred to produce a sprawling network of resources a small part of which you might want to peruse for Norm’s news & views. I have a picture that made the audience at the session gasp in disbelief ...

Notes on Bosworth · Adam Bosworth been discussing what he calls a “Web Services Browser” for months over at his blog, but I was really having trouble getting the point. After his speech here at XML 2003, I think I sort of get it ...

A Day Late in Philly · I got to the big XML conference here on its second day and it looks like I missed lots of interesting stuff. Oh well, I’m just here to hang out and chat, and I see by what I read that there are a few folks here that know but haven’t met. Flag me down for a talk, y’all; Wednesday I’ll be wearing a pink sports-jacket so I’m easy to spot.

On Search: XML · Searching is all about text, and the proportion of all the world’s text that is XML keeps getting higher and higher. So if you’re going to do search, at some point you’re going to have to think about searching XML. Herewith a survey of some of the issues and problems (which, like other essays as we approach the end of On Search, contains opinions among the reportage) ...

UTF-8+names · Here’s the problem. You want to put “funny” characters in your XML, ones that aren’t on your keyboard, like “ñ” isn’t in Greece and “Δ” isn’t in Mexico. XML has a bunch of ways to do this; some of them require sophisticated software, others are really ugly, and if you want to avoid both the ugliness and the fancy software, you can use a DTD. Except for people don’t want to use DTDs either. This set of issues has been darkening the XML skies for years now, but we may have stumbled on a way out of the box. (Warning: Bit-banging technicalia of interest only to XML obsessives) ...

Emacs, XML, Unicode · I was struck by Norm Walsh’s essay Goodbye DTDs, in which he talks of going to an all-RelaxNG environment, no more DTDs. Within seconds of seeing it I IM’d him asking “What about special characters?” and he pointed out that there would still be some entity declarations around. ongoing has a DTD too, but I’d rather it didn’t, so I decided to see if I could wrestle Emacs to the ground so I wouldn’t need one. Of possible interest only to the eleven people in the world who edit XML in Emacs and know what “i18n” stands for. [Updated; skip to the end for a neato char-insertion function.] ...

nXML Oh My · I just went and got James Clark’s new nXML Emacs XML editing mode. I poked around a bit, wondering which XML parser and RelaxNG engine he was using, and worrying how much trouble I’d have getting this running and hooked in to my hand-compiled Emacs here on OS X. No, it’s not like that. There are 12,587 lines of elisp here apparently implementing a complete XML 1.0 processor and RelaxNG validation engine. Words fail me.

Spaghetti Doesn’t Want to be Free · A brilliant note from Rick Jelliffe of Topologi, on the subject of W3C XML Schemas, from which I excerpt: Any sufficiently monolithic technology is indistinguishable from spaghetti. Once a large technology is made from sufficiently intertwined parts, there is no way to order an exposition of it such that strongly-connected ideas are always close together. Spaghetti doesn't want to be free. (At least, "no way" to order the exposition with HTML-style pages: maybe WXS needs something more like Nelson's transclusion, where you can pull in fragments (without losing their context) and embed them into running text, without the maintenance penalty of duplicated sections.) Indeed, I think that is a forgotten rationale for XML over SGML: dumbing down an intertwined technology so that it could have a spec straightforward-enough that people could conveniently read it.

Dracon and Postel · There’s been a flurry of debate over in the PEAW mailing list about how to deal with broken feeds. Simultaneously, Aaron Swartz asserts Postel’s Law Has No Exceptions. Herewith a bit of back-fill on the relevant history and tribal knowledge, an excursus into Athenian jurisprudence, and opinions on what PEAW should do ...

Markup, Namespaces, and Meaning · Jon Udell has been thinking so furiously about mixing namespaces and the meaning of markup that I imagine a visible swirl of superheated brain energy above his home office. I think that this whole area of thought is what over in the W3C TAG we refer to as a “rat-hole”. I.e., something you can vanish down never to re-appear, or at least a place where you can waste a lot of time scurrying along twisty little passages. Herewith some (I hope) demystification ...

Namespace Pedantry · There is one aspect of XML namespaces that keeps confusing people, and since I wrote the specification, it’s at least partly my fault. In the last week alone it’s caught both Jon Udell and Aaron Swartz. There is no such thing as the “blank namespace” or the “empty namespace” or the “unqualified namespace.” An element or an attribute is either in a namespace or not, and if it’s in a namespace, the namespace has a name, and the name isn’t blank. I enclose an example and a bit more explanation ...

XML Tribal Bash, Get Yer Papers In · Way back before there was XML, there was SGML, and there was one big SGML conference a year, with unimaginative names: “SGML 1990”, “SGML 1991”, and so on. 1990 is when I started going. XML was announced to the world at SGML ’96, an occasion I’ll remember till I die. All this is a lead-in to a plug for today’s version of that conference, still unimaginatively named: XML 2003. In particular, I’d like to encourage the kind of people who read me here to think about sending in a paper and getting on stage. Read on for details ...

Generating Word · Via Don Box, I see that Oleg Tkachenko is generating WordML and feeding the output to Word, and it's working! ...

On Semantics and Markup · The term “Semantic Markup” is bandied about freely, and with every year that passes, it makes me more and more nervous. Herewith an exploration of what, if anything, those two terms mean when placed side by side. (Warning: way too long.) ...

Infopath · c|net says that Microsoft won't be including Infopath (formerly known as XDocs) in the basic MS Office bundle. This seems all wrong, I don't get it ...

Why XML Doesn't Suck · Recently in this space I complained that XML is too hard for programmers. That article got Slashdotted and was subsequently read by over thirty thousand people; I got a lot of feedback, quite a bit of it intelligent and thought-provoking. This note will argue that XML doesn't suck, and discuss some of the issues around the difficulties encountered by programmers ...

XML Is Too Hard For Programmers · XML is a bouncing thriving five-year-old now, and yet I've been feeling unsatisfied with it, particularly in recent times. In particular in my capacity as a programmer ...

Let's Move XML-dev Now! · To the extent that there is such a thing as an XML community, it's found at a few conferences and on the xml-dev mailing list. Like many electronic communities, xml-dev suffers from a few tedious permathreads, from regular childish ranting, and from side-trips into the abstruse. But if you ask a hard technical question on XML there, you'll probably get an answer, almost immediately. The problem is that the mailing list is mismanaged, broken, unreliable, inaccessible, and really ought to find a new home with competent grownup minders ...

When Is it OK To Invent New Tags? · Tantek Çelik, smart Microsoft browser guy, is blogging from the big W3C meeting now going on in Boston. Among other things, he's mad because some W3C specifications are written not in HTML but in a completely different XML language called xmlspec, and that language has some tags that are a lot like HTML tags, so why don't we just use HTML tags? I'll address some of the historical background and specifics, but Tantek is pointing at a real important issue in the world of XML: when do you invent your own language, and when do you re-use someone else's? Warning: long, and loaded with markup design theory and obscure standards history. ...

Bosworth et al on XML, SOAP, Binary Data · Today sees the publication of a thought piece by six authors the first of whom, alphabetically and (in my view anyhow) in intellectual standing, is Adam Bosworth ...

Small XML-dev Flame War · I am a member of the xml-dev mailing list, the original XML-zealot conclave and home to most of the people in the world who worry seriously about XML in general; a very special and fortunately small shared obsession ...

Examples, Examples, Examples! · Today there was a release of a draft of the UBL draft specifications. I pulled them down and there's legalisms, and definitions, and schemas, and UML, but not one single example of a UBL message.... Argh!!!!! ...

XML and Me · I helped invent XML. It happened (mostly) between July and December 1996. There were 11 people who did the heavy work on the XML Working Group. There were three co-editors of the official XML specification. I was one of the eleven and one of the three ...

The Importance of Bits on the Wire · Originally a ranting email to Dave Winer provoked by some silly statement out of Apple I think. ...

ongoing

What this is ·

Truth · Biz · Tech

author · Dad
colophon · rights

Random image, linked to its containing fragment

By Tim Bray.

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.

I’m on Mastodon!

What · Technology · XML