We Anglophones enjoy a living language but are stuck with a long-dead character set; are 26 letters really enough to last from now to the end of English? Others are more fortunate; Asians not only have more characters but get new ones. The brand-new Release 4.0 of Unicode defines 96,513 characters, of which the vast majority are Asian. This note is provoked by the Emoji phenomenon, worth a look in its own right, but the issues of languages and characters and their growth are big ones.
If you haven't looked at the previous ongoing essay on Unicode, you might want to do that before proceeding, some of the abbreviations will make more sense.
Emoji · In Japanese, ji means character. Thus, kanji are characters originally borrowed from the Han Chinese repertoire, gaiji are “foreign characters”: obscure variants, historical curiosities, and occasionally newly-invented custom characters.
Emoji are characters invented by NTT DoCoMo for people to use in
text messages on their cellphones.
The most obvious example is the well-known “smiley face”, often
encoded in ASCII as
:) and called an “emoticon”.
Thus, “emotion” + ji gives emoji.
DoCoMo makes emoji easy to type into your cellphone, and people use them; there were 207 last time I checked. Since DoCoMo uses standard Web infrastructure, including basic HTML and HTTP and all that, the question arises of how these things are encoded. They use Unicode's “Private Use Area”, a built-in range of character codes that's there for people who want to use their own non-standardized characters.
I'm of two minds; I can't decide whether this is cheering evidence of human creative bubbliness, or a vile standards-busting lock-in attempt. Maybe both.
On Inventing Characters ·
U+00DE LATIN CAPITAL LETTER THORN
It seems unfair that nobody gets to invent new characters. Well, mathematicians do; they label their abstractions not just with Greek and and even Hebrew letters, they use particular combinations of fonts (such as the Old German Fraktur) and diacritics in a way that seems not only promiscuous but perverse. For example, it really hardly seems necessary to take a perfectly straightforward concept like countable-infinity and represent it with a typographical orgasm consisting of a large Hebrew letter Alef (U+05D0) with a subscript zero, pronounced Aleph-Null. Mind you, it looks kind of cool. Maybe that's the point.
U+021C LATIN CAPITAL LETTER YOGH
Anyhow, Unicode has a generously-supplied selection of characters just for mathies. But that's not good enough, publishers of serious mathematics often have to wander outside the capacious bounds of Unicode.
Aleph-null and the emoji are similar in that they are useful and have some semantics, but no sound. So in one sense, they are second-class characters, poor little mute things. A prediction: the English character set is dead, we have all the sounds we need, we won't get any more first-rate speaking characters.
Dr. Seuss · Theodor Seuss Geisel (“Dr. Seuss”, 1904-1991) was among those who couldn't resist the lure of making characters. In On Beyond Zebra, he invented a panoply of wonderful new letters each ostensibly required to stand for a wonderful imaginary animal. My 3½-year-old isn't ready for this one yet, but will be soon, and I look forward immensely to reading it to him.