I was struck by Norm Walsh’s essay Goodbye DTDs, in which he talks of going to an all-RelaxNG environment, no more DTDs. Within seconds of seeing it I IM’d him asking “What about special characters?” and he pointed out that there would still be some entity declarations around. ongoing has a DTD too, but I’d rather it didn’t, so I decided to see if I could wrestle Emacs to the ground so I wouldn’t need one. Of possible interest only to the eleven people in the world who edit XML in Emacs and know what “i18n” stands for. [Updated; skip to the end for a neato char-insertion function.]
The problem is, not only do I regularly want to use non-ASCII characters like ½ and ä, I want “smart quotes” like those you see around the first instance of "smart quotes", not dumb quotes like those around the second. Also real apostrophes (’ not ').
I think the Mac has input methods that let you type these things in, but they don’t seem to play nice with Emacs.
Of course, you can always enter ä as
that kind of sucks, so most bound-to-ASCII people like to use something like
ä which is helpfully built-in to HTML, except for if
you’re in XML-land it’s not built-in so you have to have somewhere
to declare it, which is a DTD.
This is why there’s a file in the production system here called
ongoing.dtd that gets stuck on the front of each of these notes
as it’s rendered for the Web.
As a side-effect of going through the XML machinery, what gets published has
none of this, just the naked UTF-8 characters.
Anyhow, dammit, Emacs is modern software and ought to give me some way to type in and view UTF-8 characters directly, few fonts are so primitive as to be missing smart quotes and so on. I got it working eventually, here’s how:
Language Environment, Huh? · Emacs has its own idea of how to store characters, which I haven’t really figured out yet but fortunately you don’t have to, because you can force it to use UTF-8 when it saves, like so:
I just put that in my
.emacs, but you could put it in a
special XML-editing ghetto if you wanted.
Now it seems to me that “language environment” is not really a category into which I’d sort the string “UTF-8” but whatever.
Font Pain · Emacs has this arcane structure of fonts and faces that, once again, I’ve never really figured out, but fortunately, once again, you don’t have to, you can just use a stone axe, all you have to do is to figure out the X11-i-fied name of a font with some basic Unicode moxie. This is what works under OS X:
Jamming In the Bytes ·
The magic function here is
ucs-insert, which of course works
quite differently in interactive and background modes.
The real hackers can stop here, because they’ll all now have ideas how to
ucs-insert into their own lifestyles.
I have hardwired keybindings for smart quotes, for the rest of them I offer
the following solution; my elisp is rusty and far from idiomatic but it
should give the idea.
The elisp function
x-popup-menu, by the way, generally sucks,
(defvar ongoing-char-choice '("Special characters" ("" ("ccedil" #xe7) ("copyright" #xa9) ("degree" #xb0) ("dot" #xb7) ("eacute" #xe9) ("half" "½") ("omacr" "ō") ("oouml" #xe4) ("uuml" #xfc) ("euro" #x20ac) ("cents" #xa2) ("egrave" #xe8) ("lsquo" #x2018) ("rsquo" #x2019) ("ldquo" #x201c) ("rdquo" #x201d) ("mdash" #x2014)))) (defun ong-special-chars-menu () "Insert a special character from a menu" (interactive) (let ((value (car (x-popup-menu (list '(10 10) (selected-window)) ongoing-char-choice)))) (cond ((integerp value) (ucs-insert value)) ((stringp value) (insert value)) ('t )))) ;; so you can hit escape and make the menu go away
ong-special-chars-menu to some handy key and you’re
cooking. I could have put the unicode characters themselves in the
left-hand column but (on OS X at least) whatever displays menus is stuck
firmly in 8859-land, thus the names.
omacr characters are given as NCRs
not literals because
they’re not in the font I edit in—but that’s OK, they don’t
show up that often.
Now the source-code of ongoing that I edit is ever so much prettier.
Easy Insertion of Commonly-Used Special Characters · I was worrying about how to make the job of inserting the special characters that I use all the time easier. These are most commonly the smart quotes and apostrophes, occasionally an em-dash and so on. It dawned on me that I now never need to type an old-fashioned apostrophe any more, so I bound that key to the following function. So you type apostrophe twice to get a smart apostrophe, you type apostrophe-S to open or close single quotes, apostrophe-D for double quotes, apostrophe-dash for mdash, plus a couple of other handy little things:
(defun one-quote () "" (interactive) (insert ?')) (defvar sq-state 'nil "In single-quotes?") (defvar dq-state 'nil "In double quotes?") (defun ong-insert-special (c) "Insert special characters, like so: s => open/close single quotes d => open/close double quotes ' => apostrophe a => <a href= i => <img src= & => & < => < - => mdash . => center-dot" (interactive "c" "'") (cond ((= c ?s) (if sq-state (progn (ucs-insert #x2019) (setq sq-state 'nil)) (ucs-insert #x2018) (setq sq-state 't))) ((= c ?d) (if dq-state (progn (ucs-insert #x201d) (setq dq-state 'nil)) (ucs-insert #x201c) (setq dq-state 't))) ((= c ?') (ucs-insert #x2019)) ((= c ?a) (progn (if (> (current-column) 0) (newline-and-indent)) (insert "<a href=\"\">") (backward-char 2) )) ((= c ?i) (progn (if (> (current-column) 0) (newline-and-indent)) (insert "<img src=\"\" alt=\"\" />") (backward-char 11) )) ((= c ?&) (insert "&")) ((= c ?<) (insert "<")) ((= c ?-) (ucs-insert #x2014)) ((= c ?.) (ucs-insert #xb7))))