We Anglophones enjoy a living language but are stuck with a long-dead character set; are 26 letters really enough to last from now to the end of English? Others are more fortunate; Asians not only have more characters but get new ones. The brand-new Release 4.0 of Unicode defines 96,513 characters, of which the vast majority are Asian. This note is provoked by the Emoji phenomenon, worth a look in its own right, but the issues of languages and characters and their growth are big ones.

If you haven't looked at the previous ongoing essay on Unicode, you might want to do that before proceeding, some of the abbreviations will make more sense.

Emoji · In Japanese, ji means character. Thus, kanji are characters originally borrowed from the Han Chinese repertoire, gaiji are “foreign characters”: obscure variants, historical curiosities, and occasionally newly-invented custom characters.

Some emoji
Emoji

Emoji are characters invented by NTT DoCoMo for people to use in text messages on their cellphones. The most obvious example is the well-known “smiley face”, often encoded in ASCII as :) and called an “emoticon”. Thus, “emotion” + ji gives emoji.

DoCoMo makes emoji easy to type into your cellphone, and people use them; there were 207 last time I checked. Since DoCoMo uses standard Web infrastructure, including basic HTML and HTTP and all that, the question arises of how these things are encoded. They use Unicode's “Private Use Area”, a built-in range of character codes that's there for people who want to use their own non-standardized characters.

I'm of two minds; I can't decide whether this is cheering evidence of human creative bubbliness, or a vile standards-busting lock-in attempt. Maybe both.

On Inventing Characters · <code>U+00DE</code> LATIN CAPITAL LETTER THORN

Right: U+00DE LATIN CAPITAL LETTER THORN

The English alphabet has been stable for quite some time; the venerable letters Yogh, Edh, and Thorn (perhaps derived from runes) passing away with the Norman invasion, and the U and V finally separating cleanly sometime around 1700; so in one sense, “U” is the most recent arrival in our alphabet. (Actually, the chequered history of “U” and “V”, particularly baroque in the case of German, is worth an essay in itself, granted of course that you're obsessive about this kind of thing, but you've read this far.) Is “U”, then, the last letter, the end of history, the last nail in the coffin of our language's character set?

Aleph-Null

It seems unfair that nobody gets to invent new characters. Well, mathematicians do; they label their abstractions not just with Greek and and even Hebrew letters, they use particular combinations of fonts (such as the Old German Fraktur) and diacritics in a way that seems not only promiscuous but perverse. For example, it really hardly seems necessary to take a perfectly straightforward concept like countable-infinity and represent it with a typographical orgasm consisting of a large Hebrew letter Alef (U+05D0) with a subscript zero, pronounced Aleph-Null. Mind you, it looks kind of cool. Maybe that's the point.

<code>U+021C</code> LATIN CAPITAL LETTER YOGH

Right: U+021C LATIN CAPITAL LETTER YOGH

Anyhow, Unicode has a generously-supplied selection of characters just for mathies. But that's not good enough, publishers of serious mathematics often have to wander outside the capacious bounds of Unicode.

Aleph-null and the emoji are similar in that they are useful and have some semantics, but no sound. So in one sense, they are second-class characters, poor little mute things. A prediction: the English character set is dead, we have all the sounds we need, we won't get any more first-rate speaking characters.

Dr. Seuss · Cover of Seuss\' On Beyond Zebra Theodor Seuss Geisel (“Dr. Seuss”, 1904-1991) was among those who couldn't resist the lure of making characters. In On Beyond Zebra, he invented a panoply of wonderful new letters each ostensibly required to stand for a wonderful imaginary animal. My 3½-year-old isn't ready for this one yet, but will be soon, and I look forward immensely to reading it to him.


author · Dad · software · colophon · rights

April 21, 2003
· Technology (77 fragments)
· · Presentation (18 more)
· Language (56 more)

By .

I am an employee
of Amazon.com, but
the opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.