In the past few days I’ve been watching two debates on the subject of Unicode; one on the main IETF general-discussion list, and another on ruby-talk (there must be a better archive). In IETF-land, the elders are once again convincing each other that Internet Standards need not be written in a way that allows characters other than ASCII; thus, for example, you can’t correctly record the names of contributors like Bill de hÓra or Martin Dürst; nor can you illustrate any discussions of network protocols which carry payloads other than those which can be expressed in primitively-typeset English. I have a lot of the respect for the IETF’s achievements, but I think my revulsion at this institutional bigotry will probably soon drive me out of the organization. In Ruby-land, it seems that Matz has spoken, and Ruby, the next generation, will have a wonderful String class that deals with everything; handling Unicode, which they see as unacceptably limited, as merely one case among many. This thinking seems deeply broken to me but I am only shallowly immersed in Ruby and don’t understand the Han Unification angst that is at the root of things. I don’t have much influence in either community (which is appropriate, I haven’t earned it). I’ll raise my voice for, what that’s worth, to argue that getting Unicode really right is a necessary condition for being a technology provider in the third millennium, and may prove to be sufficient, insofar as internationalized-text issues go. I’m not optimistic that this will make any difference. But if either community decides to give Unicode a serious go, I’ll volunteer to pitch in, to work to make it work.

author · Dad · software · colophon · rights
picture of the day
June 19, 2006
· Technology (85 fragments)
· · Coding (99 fragments)
· · · Text (12 more)


I am an employee of, but the opinions expressed here are my own, and no other party necessarily agrees with them.

A full disclosure of my professional interests is on the author page.