ongoing by Tim Bray

Anne van Kesteren suggests an XML 2.0 mostly defined by less-Draconian error handling, provoking further discussion over chez Sam Ruby.

I was recently asked about this by Xavier Borderie in an interview currently appearing at Journal du Net. Since not all ongoing will be able to read my incredibly-polished French (well actually, Xavier translated my English, but I nit-picked the translation), I thought I should give the English version here:

Micah Dubinko asks “Is HTML on the Web a special case?”, and the answer is obviously “yes”. Note that the HTML language being developed by the WhatWG is not XML at all, and I'm not brave enough to predict whether that is a good idea.

There have always been a few tools that processed XML data but also accepted broken (non-XML) data; for example, every Web browser. It seems unlikely to me that there will ever be an official new release called “XML 2.0” that has different error-handling rules. But I'm sure that the arguments about when to apply real XML error handling and when software should accept non-XML data will go on forever; among other things they are quite entertaining.

There's a spectrum of situations: at one end, if an electronic-trading system receives an XML message for a transaction valued at €2,000,000, and there's a problem with a missing end tag, you do not want the system guessing what the message meant, you want to report an error. At the other end, if someone sends a blog post from their cellphone with a picture of a cute kitten, you don't want to reject it because there's an “&” in the wrong spot. The world is complicated.

Contributions

Comment feed for ongoing:

From: Mark (Jan 30 2007, at 18:58)

> if an electronic-trading system receives an XML message for a transaction valued at €2,000,000, and there's a problem with a missing end tag, you do not want the system guessing what the message meant

You have used this example, or variations of it, since 1997. I think I can finally express why it irritates me so much: you are conflating "non-draconian error handling" with "non-deterministic error handling". It is true that there are some non-draconian formats which do not define an error handling mechanism, and it is true that this leads to non-interoperable implementations, but it is not true that non-draconian error handling implies "the system has to guess." It is possible to specify a deterministic algorithm for graceful (non-draconian) error handling; this is one of the primary things WHATWG is attempting to do for HTML 5.

If any format (including an as-yet-unspecified format named "XML 2.0") allows the creation of a document that two clients can parse into incompatible representations, and both clients have an equal footing for claiming that their way is correct, then that format has a serious bug. Draconian error handling is one way to solve such a bug, but it is not the only way, and for 10 years you've been using an overly simplistic example that misleadingly claims otherwise.

XML 2.0?

Contributions