The OED means a whole lot to me; professionally, I owe it everything. My work on it was 26 (!) years ago, but then this spring I got an invitation to their Symposium, which happened last week, and there was only one possible answer. I’m profoundly grateful they asked, and would do it again in a flash. This entry, like the OED, is extreme in length and prone to rambling; but, I hope, also like the dictionary in that it might provide pleasure to people who like words for their own sake.
Backgrounder: The Dictionary · Much has been written about it; It’s by far the largest dictionary of English and its inclusiveness is truly heroic. Here are some things about the OED which are worth reading:
Caught in the Web of Words: James Murray and the Oxford English Dictionary by K.M. Elisabeth Murray
Empire of Words by John Willinsky.
Preface to the Third Edition of the OED by John Simpson, Chief Editor.
Also, the Wikipedia entry is quite good.
Backgrounder: Building an Entry · Suppose you’re an OED editor and your job today is to write an entry for a new word, say “fauxhawk” or “slutbag”. How do you do that?
It turns out that since the late 1800s, the OED project has been running a Reading Programme (which you can join), in which volunteers all over the world, well, read stuff. And when they do, they take note of unusual words; where “unusual” means either they’ve not seen it before, or they’ve not seen it used in that way. And “take note of” used to mean “write it down in an 4x6 index card” but now means “capture it online”. (Footnote: I wrote the first-ever program to capture citations, sometime in the Eighties. In C.)
Over the years, the Reading Programme has built up a database of many millions of citations. The editor facing the new-entry problem looks through the citations, picks the ones that do a good job of illustrating the word, and in the (common) case where it has more than one sense, sorts them into piles by meaning. Once that’s done, you fill in the definitions and etymology and pronunciation and you’ve got a new entry. Yes, the quotations, cited by author, date, and title, go right there in the entry; there are roughly 2½ million of them in the Second Edition. That’s why it’s so big.
The take-aways about the dictionary are:
It’s scholarly, in the formal sense: Nothing goes in unless there are citations for it, including author, work, and date.
It’s descriptive, documenting how the language actually is, rather than prescriptive, asserting how it should be.
It’s crowdsourced, obviously.
Backgrounder: The Business · Dictionaries used to be an insanely profitable business, because more or less every household needs a couple, and you don’t have to pay the editors those annoying royalties. Oxford University Press has raked in a lot of dough over the decades by flogging dictionaries, with the OED itself being a useful brand leader, helping sell millions of Concises and Shorters and maybe most of all, of the Oxford Advanced Learners’ Dictionary, aimed at the billion or two people who at any moment in time are engaged in learning English; at one point in history the OALD was the second-best-selling book in the world, trailing only the Bible.
But times are tougher now; the world’s most influential dictionary for some years has been the one built into Microsoft Office.
“The Press”, as they call it, has always in my experience been a hard-ass unsentimental business. Since I consider this best existing effort to document the English language as nearly sacramental, this has troubled me at times; but it’s mostly seemed to have not got in the way of doing the right thing.
Backgrounder: Me and the OED · In 1987, five years into my career, I was building T1-multiplexer software and one day I saw an ad: the University of Waterloo’s “Centre for the New OED” was looking for software development team lead, a research-staff job. The Centre resulted from a 3-way collaboration between The Press, the University, and IBM to computerize the construction of the Second Edition of the OED. There was a bunch of government money in the pot, and with an unusual condition: the project had to produce real software along with the research publications. I got the job, I think because I was the only applicant who actually read books.
By the way, the text of the dictionary, 572M worth of it in a period when a good computer had like 16M of RAM, was marked up in what we’d now call XML.
Anyhow, we wrote some software and the Second Edition launched on schedule in 1987; it paid for itself and then some. The software was good enough to launch a company, which then found itself ahead of the game in full-text search and then Web content management, at least partly because we’d cut our teeth on what then seemed like big data, tagged in somewhat-Web-flavored style. It’s called Open Text and is still there; the biggest software company in Canada, I hear.
Since then, my life has been interesting, a lucky stumble from Open Text to XML to blogging to dynamic languages to concurrency to Android to where I sit now at Google. The best luck you can have is to find yourself at the right place at the right time, and I’ve had more of that than any five average people. But I’ll never forget that the first right-place-to-be for me was the OED project.
Oh, and check out Lustre-Lustrous for some nifty OED-related photography, including a curvy fashion shoot.
The Symposium · I hadn’t seen the OED folk for twenty years-ish, but when I got a note out of the blue this spring from editor-in-chief John Simpson, inviting me to Oxford for the Symposium, I thought about it for like 12 microseconds and booked the vacation days. This involved going a third of the way around the world and back in four days, which pretty well sucks; but I have no regrets.
What they did was gather a collection of 70 or so lexicographers and educators and linguists and computer programmers and totally-cool authors James Gleick and Philip Pullman, and give us a chance to talk to each other about what the ultimate English dictionary should be.
The structure was formal and I think they would have done better with an Unconference format, but it still worked pretty well. Here’s a picture from my favorite session, What should the limits of OED’s coverage be?, where everyone on the panel argued that the dictionary needed to be BIGGER. Speaking is Jonathon Green, self-professed “slang lexicographer” who pointed out that in Green’s Dictionary of Slang there are over 125,000 terms, while the OED has only 7,700. Needs fixing!
Then Danica Salazar (on stage, long black hair) argued that the coverage of the exploding world Englishes needed work; her specific examples were words of Philippine derivation: “boondock”, “balikbayan”, and more; not only are many just missing, but those that are there are there are not adequately tagged in a way that would let you straightforwardly pull out vocabulary of Southeast-Asian extraction. Which turns out to be hard, because that English has substantial overlap with the Indian and Spanish-creole flavors.
Finally, Bryan Garner (to Danica’s left), who writes language books with a focus on Law, asserted that if the dictionary did a proper job with simple two-word combinations, it’d be three times its current size and much better than it currently is.
I don’t know if the world has the time, money and expertise to triple the OED in size; but what a magnificent dream.
My own contribution was mostly as an emissary, from the Internet in general and Wikipedia in particular. I think the OED’s biggest problem is that most people can’t get at it, and for the ones who can, it’s too much work. And that, if it’s going to grow the way it should, it’s going to have to crowdsource more than just the data-gathering.
I’m not going to deep-dive on the rest of the Symposium; among other things, OUP paid for it and they’re entitled to the first fruits.
But I have to emphasize the childish glee that happens when you get a bunch of linguists and lexicographers talking about neologisms and dialects and the outer fringes of our shared English heritage.
Saying Goodbye · The closing dinner was at the Ashmolean Museum, which I recommend to anyone who can get there; I posted a few pictures. The company was good, so were the food and drinks. After we’d eaten, John Simpson, who’s retiring and handing on the Editor-in-Chief job, made an elegant little speech full of thanks-yous and memories.
He closed by talking about the phrase, familiar to every parent: “Are we there yet?” Because dictionaries have historically shipped late. The Internet future is incremental though, so you never get there; but, John said: “We’ll enjoy the journey.” Oh yeah.