Republished · At some point in the tran­si­tion to De­bian Sarge, some­thing broke in the the on­go­ing soft­ware. The perl code reads text us­ing an XML pro­ces­sor and var­i­ous pieces of it get stashed in a Mysql database. On­ly some­where along the line, non-ASCII UTF-8 char­ac­ters were get­ting trashed. I tried all sorts of stupid dodges, and was whin­ing away at Sam Ru­by via in­stant mes­sen­ger, and he said “of course, you could do it all as seven-bit ASCII via 몾... or you could rewrite it in Ru­by and It Would Be Much Better”. I shrieked “Get thee be­hind me foul tempter!” and have now jammed ev­ery­thing in­to 7-bit ASCII as it comes out of the XML parser, and of course all the prob­lems have gone away. Ac­tu­al­ly, the code got sim­pler, lots of XML es­cap­ing/unescap­ing calls are no longer nec­es­sary. This is one of the nice things about XML I guess, it al­lows you to be a good in­ter­na­tion­al­iza­tion cit­i­zen even when your soft­ware in­fras­truc­ture isn’t. It still feels evil. Any­how, the whole site’s been re­pub­lished, let me know if anything’s bust­ed. (By the way, if you’re read­ing this in my RSS feed and all the en­tries show up as new, switch to the Atom feed and that prob­lem will go away, be­cause Atom ac­tu­al­ly has unique IDs and dat­es­tamps that work.) [Up­dat­ed: Tony Coates (in­ter­est­ing new blog there, BTW) re­ports that Opera 8.02 gets it back­ward­s, which means that it’s one of the rare pieces of soft­ware that re­spects guids in RSS, but that it’s do­ing Atom 1.0 wrong.]
