At some point in the transition to Debian Sarge, something broke in the the ongoing software. The perl code reads text using an XML processor and various pieces of it get stashed in a Mysql database. Only somewhere along the line, non-ASCII UTF-8 characters were getting trashed. I tried all sorts of stupid dodges, and was whining away at Sam Ruby via instant messenger, and he said “of course, you could do it all as seven-bit ASCII via 몾... or you could rewrite it in Ruby and It Would Be Much Better”. I shrieked “Get thee behind me foul tempter!” and have now jammed everything into 7-bit ASCII as it comes out of the XML parser, and of course all the problems have gone away. Actually, the code got simpler, lots of XML escaping/unescaping calls are no longer necessary. This is one of the nice things about XML I guess, it allows you to be a good internationalization citizen even when your software infrastructure isn’t. It still feels evil. Anyhow, the whole site’s been republished, let me know if anything’s busted. (By the way, if you’re reading this in my RSS feed and all the entries show up as new, switch to the Atom feed and that problem will go away, because Atom actually has unique IDs and datestamps that work.) [Updated: Tony Coates (interesting new blog there, BTW) reports that Opera 8.02 gets it backwards, which means that it’s one of the rare pieces of software that respects guids in RSS, but that it’s doing Atom 1.0 wrong.]

author · Dad
colophon · rights
picture of the day
September 03, 2005
· Technology (90 fragments)
· · Coding (99 fragments)
· · · Text (12 more)
· · XML (136 more)

By .

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.

I’m on Mastodon!