The OED means a whole lot to me; professionally, I owe it everything. My work on it was 26 (!) years ago, but then this spring I got an invitation to their Symposium, which happened last week, and there was only one possible answer. I’m profoundly grateful they asked, and would do it again in a flash. This entry, like the OED, is extreme in length and prone to rambling; but, I hope, also like the dictionary in that it might provide pleasure to people who like words for their own sake.

The Randolph Hotel in Oxford

The Symposium was at the Randolph Hotel in Oxford;
about as old-school inside as out.

Backgrounder: The Dictionary · Much has been written about it; It’s by far the largest dictionary of English and its inclusiveness is truly heroic. Here are some things about the OED which are worth reading:

Also, the Wikipedia entry is quite good.

Backgrounder: Building an Entry · Suppose you’re an OED editor and your job today is to write an entry for a new word, say “fauxhawk” or “slutbag”. How do you do that?

It turns out that since the late 1800s, the OED project has been running a Reading Programme (which you can join), in which volunteers all over the world, well, read stuff. And when they do, they take note of unusual words; where “unusual” means either they’ve not seen it before, or they’ve not seen it used in that way. And “take note of” used to mean “write it down in an 4x6 index card” but now means “capture it online”. (Footnote: I wrote the first-ever program to capture citations, sometime in the Eighties. In C.)

Over the years, the Reading Programme has built up a database of many millions of citations. The editor facing the new-entry problem looks through the citations, picks the ones that do a good job of illustrating the word, and in the (common) case where it has more than one sense, sorts them into piles by meaning. Once that’s done, you fill in the definitions and etymology and pronunciation and you’ve got a new entry. Yes, the quotations, cited by author, date, and title, go right there in the entry; there are roughly 2½ million of them in the Second Edition. That’s why it’s so big.

The take-aways about the dictionary are:

  1. It’s scholarly, in the formal sense: Nothing goes in unless there are citations for it, including author, work, and date.

  2. It’s descriptive, documenting how the language actually is, rather than prescriptive, asserting how it should be.

  3. It’s crowdsourced, obviously.

Backgrounder: The Business · Dictionaries used to be an insanely profitable business, because more or less every household needs a couple, and you don’t have to pay the editors those annoying royalties. Oxford University Press has raked in a lot of dough over the decades by flogging dictionaries, with the OED itself being a useful brand leader, helping sell millions of Concises and Shorters and maybe most of all, of the Oxford Advanced Learners’ Dictionary, aimed at the billion or two people who at any moment in time are engaged in learning English; at one point in history the OALD was the second-best-selling book in the world, trailing only the Bible.

But times are tougher now; the world’s most influential dictionary for some years has been the one built into Microsoft Office.

“The Press”, as they call it, has always in my experience been a hard-ass unsentimental business. Since I consider this best existing effort to document the English language as nearly sacramental, this has troubled me at times; but it’s mostly seemed to have not got in the way of doing the right thing.

Backgrounder: Me and the OED · In 1987, five years into my career, I was building T1-multiplexer software and one day I saw an ad: the University of Waterloo’s “Centre for the New OED” was looking for software development team lead, a research-staff job. The Centre resulted from a 3-way collaboration between The Press, the University, and IBM to computerize the construction of the Second Edition of the OED. There was a bunch of government money in the pot, and with an unusual condition: the project had to produce real software along with the research publications. I got the job, I think because I was the only applicant who actually read books.

By the way, the text of the dictionary, 572M worth of it in a period when a good computer had like 16M of RAM, was marked up in what we’d now call XML.

Anyhow, we wrote some software and the Second Edition launched on schedule in 1987; it paid for itself and then some. The software was good enough to launch a company, which then found itself ahead of the game in full-text search and then Web content management, at least partly because we’d cut our teeth on what then seemed like big data, tagged in somewhat-Web-flavored style. It’s called Open Text and is still there; the biggest software company in Canada, I hear.

Since then, my life has been interesting, a lucky stumble from Open Text to XML to blogging to dynamic languages to concurrency to Android to where I sit now at Google. The best luck you can have is to find yourself at the right place at the right time, and I’ve had more of that than any five average people. But I’ll never forget that the first right-place-to-be for me was the OED project.

Oh, and check out Lustre-Lustrous for some nifty OED-related photography, including a curvy fashion shoot.

The Symposium · I hadn’t seen the OED folk for twenty years-ish, but when I got a note out of the blue this spring from editor-in-chief John Simpson, inviting me to Oxford for the Symposium, I thought about it for like 12 microseconds and booked the vacation days. This involved going a third of the way around the world and back in four days, which pretty well sucks; but I have no regrets.

What they did was gather a collection of 70 or so lexicographers and educators and linguists and computer programmers and totally-cool authors James Gleick and Philip Pullman, and give us a chance to talk to each other about what the ultimate English dictionary should be.

The structure was formal and I think they would have done better with an Unconference format, but it still worked pretty well. Here’s a picture from my favorite session, What should the limits of OED’s coverage be?, where everyone on the panel argued that the dictionary needed to be BIGGER. Speaking is Jonathon Green, self-professed “slang lexicographer” who pointed out that in Green’s Dictionary of Slang there are over 125,000 terms, while the OED has only 7,700. Needs fixing!

Discussing the limits of the OED coverage at the 2013 OED symposium

Then Danica Salazar (on stage, long black hair) argued that the coverage of the exploding world Englishes needed work; her specific examples were words of Philippine derivation: “boondock”, “balikbayan”, and more; not only are many just missing, but those that are there are there are not adequately tagged in a way that would let you straightforwardly pull out vocabulary of Southeast-Asian extraction. Which turns out to be hard, because that English has substantial overlap with the Indian and Spanish-creole flavors.

Finally, Bryan Garner (to Danica’s left), who writes language books with a focus on Law, asserted that if the dictionary did a proper job with simple two-word combinations, it’d be three times its current size and much better than it currently is.

I don’t know if the world has the time, money and expertise to triple the OED in size; but what a magnificent dream.

My own contribution was mostly as an emissary, from the Internet in general and Wikipedia in particular. I think the OED’s biggest problem is that most people can’t get at it, and for the ones who can, it’s too much work. And that, if it’s going to grow the way it should, it’s going to have to crowdsource more than just the data-gathering.

I’m not going to deep-dive on the rest of the Symposium; among other things, OUP paid for it and they’re entitled to the first fruits.

But I have to emphasize the childish glee that happens when you get a bunch of linguists and lexicographers talking about neologisms and dialects and the outer fringes of our shared English heritage.

Saying Goodbye · The closing dinner was at the Ashmolean Museum, which I recommend to anyone who can get there; I posted a few pictures. The company was good, so were the food and drinks. After we’d eaten, John Simpson, who’s retiring and handing on the Editor-in-Chief job, made an elegant little speech full of thanks-yous and memories.

John Simpson closes the 2013 OED Symposium

John Simpson sends us off.

He closed by talking about the phrase, familiar to every parent: “Are we there yet?” Because dictionaries have historically shipped late. The Internet future is incremental though, so you never get there; but, John said: “We’ll enjoy the journey.” Oh yeah.



Contributions

Comment feed for ongoing:Comments feed

From: Dave Walker (Aug 04 2013, at 14:58)

Cool article - and a very cool connection to have :-). Interesting that electronic OED mark-up "became" XML - it would be enlightening to see an article describing some of its features (or a quick follow-up pointer to pre-existing info would do, if there's adequate info out there).

I read and enjoyed Simon WInchester's "The Surgeon of Crowthorne" a few years ago; it transpires that this is the UK title of "The Professor and the Madman". I second the recommendation.

Also, I have a 1970 edition full OED - the 2-volume, photo-reduced 4-pages-up version which comes with a very nice Bausch & Lomb hand magnifier. It's an astonishing piece of work - and worth noting that the "full" OED differs from the "shorter" version most people are familiar with, primarily by providing citation of first known usage for each word, where possible. A trip through a full OED, is therefore also a trip through world literature - at least, the inventive kind.

I bought my OED at a little secondhand bookshop in Cambridge - G. David. If you're into antiquarian books, stiffen your willpower or leave your wallet at home before visiting, or expect to part with serious money if you're into esoterica - last time I was there, it wasn't easy to leave without buying another dictionary, but this was a 2nd Ed. Johnson...

[link]

From: Mark (Aug 05 2013, at 08:22)

I, too, own the four-to-a-page version of the OED. Not sure of the year. It used to belong to my parents, so it's entirely possible that it is also the 1970 edition. I lost the magnifying glass in one of my many moves over the years. My OED's sole function from 2005-2012 was holding up a broken slat in my bed frame that otherwise caused the bed to sag in ways that are difficult to describe. My OED is split into two volumes, one of which I can not currently locate. (Yes, I looked under my bed.) I was considering donating the remaining volume to Goodwill, but who in their right mind would need half a dictionary? It's like buying half a condom.

Anyway, I like to think that I love language and respect words, but you'd never know it by the way I treat my OED. It certainly deserves better. I spent many hours when I was growing up looking through that dictionary, learning new words, "impressing" my parents with my newfound knowledge, and squinting a lot. My kids don't do that. If they don't know a word, they look it up on Google. Then they go back to playing Angry Birds. This makes me sad, but not sad enough to do anything about it.

[link]

From: Tim (but not THE Tim) (Aug 06 2013, at 15:05)

Was this during the time that LEXX (<a href="http://en.wikipedia.org/wiki/LEXX_%28text_editor%29"> was used as the editor?

[link]

From: Jonathon Green (Aug 17 2013, at 03:55)

Tim,

Thinking of slang, you might enjoy looking at these which are timelines I've been putting together with timeglider.com software:

Penis: http://bit.ly/14hM1V4

Vagina: http://bit.ly/1257HXD

Drunk: http://bit.ly/1dfdrjm

Alcohol: http://bit.ly/1b4ANMW

Pubs and Bars: http://bit.ly/17C9Vwo

Especially the first two which are picking up a lot of views (30K+ between so far - 8/17 - in a few days).

[link]

From: tom jones (Aug 19 2013, at 09:17)

"Also, the Wikipedia entry is quite good."

i found this quite amusing.. ;)

[link]

author · Dad
colophon · rights

August 02, 2013
· Technology (90 fragments)
· · Publishing (160 more)
· Language (57 more)

By .

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.

I’m on Mastodon!