I’ve encountered three different Ruby libraries for generating markup: there’s one in the CGI library, there’s Builder, and there’s Markaby. To some degree, all are heavily informed by the special case of generating HTML; and maybe they’re OK for that. But if you want to go further and generate XML, they’re all pointing in the same, wrong, direction. Maybe I’m missing something, but I do have an alternative to offer. Plus, I find a chance to laugh at myself gleefully. [Update: Ouch! Refuted!] [Update: And again, more seriously.]

[Well, it turns out that I’m a Commuterrorist, my code is uglier than goatse, and I’m the reason that Sun is the “zzz” in dot.biz. Leon Spencer says so, very amusingly. If you haven’t read this piece yet, stay with me till the end, then I’ll link to him again and respond. If you’ve been here before and just want to see the update, jump to The Spencer Rant.]

[Update: And again: Sam Ruby, who has more patience than I, shows that you can use the existing XML::Builder framework to achieve the effect I was trying for. He argues that my effort would be better invested in fixing any flaws I find there than inventing Yet Another Ruby XML Writer. I think he’s right.]

All these markup generators adopt two principles:

  1. You call a Ruby method to generate an element; it provides the opening and closing tags and relies on a body to fill in the middle.

  2. To generate a <foo> element, you call a method named foo.

#1 is correct, and makes Ruby a really nice language for generating markup. #2 is completely wrong in the general case.

The Element = Method Fallacy · Back when we were first doing XML, time after time someone would put up their hand and say “Oh right, and then when you’re processing this stuff you’ll dispatch the <foo> element to the foo() method (or sometimes, the foo class)! ” This turns out not to work in the general case because <Tim.Bray/>, <société/>, and <a-a_a/> are all perfectly legal XML elements, and those names are awkward. Yes, Ruby does allow you to have methods with funky names, but you have to do extra work to use them. Also, you have to work around namespaces somehow.

This mistake is particularly tempting in Ruby because it’s so easy to implement method_missing and say “Oh, they called foo, did they? Well, I’ll just slip in a <foo>.”

What Elements Really Are · An XML Element has three ingredients: a namespace URI, a short name, and an attribute list. (There are times when the programmer has to control the namespace prefix, due to the bogosity of QNames-in-content.) An attribute list has zero or more instances of name-value pairs, where the name comprises a local part and an optional namespace URI; the value is just a string. Text in attribute values and element content has to be escaped.

An XML-generating API that tries to fight against or work around this objective reality is just fatally flawed, in my view. So, can we cook up something that is reality-based and also pleasant to use?

Genx for Ruby · The Ape needs to generate XML, and was going about it in a pretty ad-hoc way. So, to explore these ideas I cooked up a little library named Genx. Code talks; here’s the Ape code for generating the HTML output.

 1  @w = Genx::Writer.new
 2  @w.element(:html) do
 3    @w.element(:head) do
 4      @w.element(:title) { @w.text 'Atom Protocol Excerciser Report' }
 5      @w.nl
 6      @w.attributes =
 7        { :rel => 'stylesheet', :type => 'text/css', :href => '/ape/ape.css' }
 8      @w.element(:link)
 9    end
10    @w.nl
11
12    @w.element(:body) do
13      @w.element(:h2) { @w.text 'The Ape says:' }
14      @w.nl
15      if @header
16        @w.element(:p) { @w.text @header }
17        @w.nl
18      end
19      @w.element(:ol) do
20        @w.nl
21        @steps.each do |step|
...

Notes, by line number:

2. You’re not always going to be able to use Ruby’s :symbol thingies for element and attribute names, but where you can, it seems like a good fit.

4. In this case, the body all fits on one line. I thought of adding a convenience “content” argument to the element method for this fairly-common case, but the idiom as it appears feels pretty natural. See also lines 13 and 16. The text method escapes its argument, of course.

5. nl inserts a newline. This is going to happen a lot and it saves 7 characters over text "\n". You could try to get clever and arrange for the API to insert pretty-printing line breaks for you, but down that path lies madness, and tossing in the occasional w.nl hardly seems onerous.

6. The element method actually takes up to three arguments: the element name, the namespace, and a hash of the attributes. But that makes for long, awkward-looking calls that spill across lines; and anyhow, you might want to re-use an argument list.

So the attributes= method sets the attribute-list for the next element call; similarly namespace= sets the namespace. Interestingly, namespace= is sticky but attributes= is transient. That is to say, you have to call attributes= once per element. This seems to work well, since it’s common to spit out a bunch of elements in a row from the same namespace, but rare to use the same attribute-list for successive elements.

8. This element has no body because <link> has no content. The code will notice and generate empty-element syntax.

Status · This code is now what runs if you use the Ape. Next? I’m not sure. I’m going to use this for my [News flash: in progress] ongoing comment system, and that’ll force me to figure out the right API.

Oops, I Did It Again! · About the time I had the Ape running, an echo at the back of my brain (it’s a vast empty space, there are lots of echoes) said “Didn’t someone already do a Ruby wrapping for Genx?”

Well, what do you know, Garrett Rooney did. And get this, here’s a little piece of his example code:

  w.element("foo") do
    w.text("bar")
  end

Sometimes, the right thing to do is obvious.

So maybe fiddling with Genx4r is the right thing to do. On the other hand, I think that if you have a language and you want to write a library for it, you should wherever possible write it in that language. I think this for a bunch of reasons, two of which are spelt CLR and JVM.

However, trying to make my current Ruby Genx general-purpose and reliable would force me into going mano a mano with Ruby’s idiosyncratic notions about strings and encodings and characters; for now it only really works with UTF-8.

On the other hand, if I did pursue this work, I’d have the quite-substantial C-language Genx test suite as a basis to work from, which would be a major help.

Hmmm.

The Spencer Rant · When I read things like Leon Spencer’s fusillade, it makes me regret being so careful what I write here; I ought to allow myself the occasional explosion like that just for fun.

Anyhow, Leon offers a Markaby example that emits the same HTML that mine does while being quite bit more readable. (In fact, it’s also quite a bit nicer than _why’s own example).

So, let’s grant the point; if you’re going to be working in an environment where you’re sure you’re not going to have any odd-looking element names, and you’re not going to have to wrangle namespaces, and you don’t mind the framework deciding where it’s safe to put newlines in your markup, and you’re not worried about element names colliding with well-known function names, then, well hey. So, am I a boring old factory factory factory fart wanting to inflict unnecessary pain and failing to recognize the obvious 80/20 point here? Or will the pleasant buzz you get today from mapping markup into code maybe give you a nasty hangover tomorrow? Is it Truth vs. Beauty? Is it Life vs. Art? Is it Alien vs. Predator? Oh, the anguish.


author · Dad · software · colophon · rights
picture of the day
September 11, 2006
· Technology (77 fragments)
· · Ruby (93 more)
· · XML (135 more)

By .

I am an employee
of Amazon.com, but
the opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.