Sam Ruby is always worth reading; today his Half Full took me on a (rare) visit to HTML5-land. Among the many things I feel guilty about, not having the strength to follow HTML5 is prominent. Ian Hickson and his posse have repeatedly proved that they can effortlessly overrun my input buffer; I wonder how W3C stalwarts like Dan Connolly are holding up under the strain?

Anyhow, one reason I’ve had trouble reading the HTML5 drafts is the voluminous language specifying the precise behavior of a browser, or something very like it, with a DOM assumed in place. I’ll never write a browser or anything like one, but I do like to write generators and tokenizers and indexers and validators and analyzers and so on. I’ve been a member of the Church of Bits On the Wire for a long time. Also I like streams better than DOMs.

Thus, I was delighted, on following a link from Sam’s post, to see HTML: The Markup Language by Mike Smith, which is an editor’s draft, i.e. essentially a strawman conversation-starter; it’s a nice straightforward specification of the proposed HTML5 language. I quote: “It provides the details necessary for producers of HTML content to create conformant documents. By design, it does not define related APIs nor attempt to specify how consumers of HTML content are meant to process documents.” I really like it when language specifications don’t try to pretend that they can constrain what a programmer can do with them.

Astoundingly, the existence of such a document seems to be controversial. Just add this to the long list of things about the HTML5 effort that baffles me. The strength of my belief that the HTML5 effort is A Good Thing For The World is lessening but hasn’t reached zero.


Comment feed for ongoing:Comments feed

From: Fred Blasdel (Nov 20 2008, at 22:59)

Such is how the sausage is made, especially when it is done so publicly.


From: Henri Sivonen (Nov 21 2008, at 00:31)

The thing is that the with scripting, the bits on the wire are understood in terms of a parser and the scripts specified by the bits themselves modifying a DOM as time progresses, so a spec that covers what the bits on the wire cause necessarily involves defining the execution environment for the scripts associated with said bits.

As for reading the specs as a producer, have you compared e.g.



The former has an item called "Contexts in which this element may be used". The latter does not.

(Disclosure: I wrote the RELAX NG bits for legend.)


From: Konbrihajm (Nov 24 2008, at 07:30)

Shooting yourself in foot that kicks the master, eh?


From: Michael(tm) Smith (Nov 26 2008, at 20:44)

Regarding Henri's comment about the element definitions in the draft lacking any "Contexts in which this element may be used" sections: Those are not missing because I made an explicit design decision to omit them. I considered adding them but I don't have the build set up yet to automatically generate them. I well recognize their utility; some form of "elements that can contain this element" section are a common feature of existing third-party reference documentation for HTML and other languages:

I do realize the the "Contexts in which this element may be used" sections in Hixie's document are more useful to authors than those in similar documents, in that they provide -- at "point of use" (for lack of a better way to state it) -- the specific details about exactly how the element can be used, and that they provide that information in prose (e.g., "As a child of a figure element, if there are no other legend element children of that element.").

That said, they aren't strictly necessary in a document that mainly attempts to provide a formal definition of what a conformant document is -- because the content models for the containing elements already define the constraints; e.g.:

figure = (legend, (text & common.elem.prose*)) | ((text & common.elem.prose*), legend?) & figure.attrs

That's not to say I don't think the draft should also provide them at point-of-use, as Hixie's document does. There are other per-element sections in the draft (e.g., the "Typical default display properties" and "Examples" sections) that are not strictly necessary either.


From: Michael(tm) Smith (Dec 01 2008, at 01:12)

Just an FYI to say that I have gone ahead and added per-element "Permitted contexts" subsections to all elements; for example:


author · Dad · software · colophon · rights
picture of the day
November 20, 2008
· Technology (87 fragments)
· · Web (393 more)

By .

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.