I was struck by Norm Walsh’s essay Goodbye DTDs, in which he talks of going to an all-RelaxNG environment, no more DTDs. Within seconds of seeing it I IM’d him asking “What about special characters?” and he pointed out that there would still be some entity declarations around. ongoing has a DTD too, but I’d rather it didn’t, so I decided to see if I could wrestle Emacs to the ground so I wouldn’t need one. Of possible interest only to the eleven people in the world who edit XML in Emacs and know what “i18n” stands for. [Updated; skip to the end for a neato char-insertion function.]

The problem is, not only do I regularly want to use non-ASCII characters like ½ and ä, I want “smart quotes” like those you see around the first instance of "smart quotes", not dumb quotes like those around the second. Also real apostrophes ( not ').

I think the Mac has input methods that let you type these things in, but they don’t seem to play nice with Emacs.

Of course, you can always enter ä as ä, but that kind of sucks, so most bound-to-ASCII people like to use something like ä which is helpfully built-in to HTML, except for if you’re in XML-land it’s not built-in so you have to have somewhere to declare it, which is a DTD. This is why there’s a file in the production system here called ongoing.dtd that gets stuck on the front of each of these notes as it’s rendered for the Web. As a side-effect of going through the XML machinery, what gets published has none of this, just the naked UTF-8 characters.

Anyhow, dammit, Emacs is modern software and ought to give me some way to type in and view UTF-8 characters directly, few fonts are so primitive as to be missing smart quotes and so on. I got it working eventually, here’s how:

Language Environment, Huh? · Emacs has its own idea of how to store characters, which I haven’t really figured out yet but fortunately you don’t have to, because you can force it to use UTF-8 when it saves, like so:

(set-language-environment "UTF-8")

I just put that in my .emacs, but you could put it in a special XML-editing ghetto if you wanted.

Now it seems to me that “language environment” is not really a category into which I’d sort the string “UTF-8” but whatever.

Font Pain · Emacs has this arcane structure of fonts and faces that, once again, I’ve never really figured out, but fortunately, once again, you don’t have to, you can just use a stone axe, all you have to do is to figure out the X11-i-fied name of a font with some basic Unicode moxie. This is what works under OS X:

(set-default-font "-etl-fixed-medium-r-normal-*-16-*-*-*-*-*-fontset-mac")

Jamming In the Bytes · The magic function here is ucs-insert, which of course works quite differently in interactive and background modes. The real hackers can stop here, because they’ll all now have ideas how to weave ucs-insert into their own lifestyles. I have hardwired keybindings for smart quotes, for the rest of them I offer the following solution; my elisp is rusty and far from idiomatic but it should give the idea. The elisp function x-popup-menu, by the way, generally sucks, but whatever.

(defvar ongoing-char-choice
 '("Special characters"
   (""
    ("ccedil"    #xe7)
    ("copyright" #xa9)
    ("degree"    #xb0)
    ("dot"       #xb7)
    ("eacute"    #xe9)
    ("half"      "½")
    ("omacr"     "ō")
    ("oouml"     #xe4)
    ("uuml"      #xfc)
    ("euro"      #x20ac)
    ("cents"     #xa2)
    ("egrave"    #xe8)
    ("lsquo"     #x2018)
    ("rsquo"     #x2019)
    ("ldquo"     #x201c)
    ("rdquo"     #x201d)
    ("mdash"     #x2014))))

(defun ong-special-chars-menu ()
  "Insert a special character from a menu"
  (interactive)
  (let ((value
	 (car (x-popup-menu
	       (list '(10 10) (selected-window))
	       ongoing-char-choice))))
    (cond
     ((integerp value) (ucs-insert value))
     ((stringp  value) (insert value))
     ('t )))) ;; so you can hit escape and make the menu go away

Bind ong-special-chars-menu to some handy key and you’re cooking. I could have put the unicode characters themselves in the left-hand column but (on OS X at least) whatever displays menus is stuck firmly in 8859-land, thus the names. The half and omacr characters are given as NCRs not literals because they’re not in the font I edit in—but that’s OK, they don’t show up that often.

Now the source-code of ongoing that I edit is ever so much prettier.

Easy Insertion of Commonly-Used Special Characters · I was worrying about how to make the job of inserting the special characters that I use all the time easier. These are most commonly the smart quotes and apostrophes, occasionally an em-dash and so on. It dawned on me that I now never need to type an old-fashioned apostrophe any more, so I bound that key to the following function. So you type apostrophe twice to get a smart apostrophe, you type apostrophe-S to open or close single quotes, apostrophe-D for double quotes, apostrophe-dash for mdash, plus a couple of other handy little things:

(defun one-quote () "" (interactive) (insert ?'))
(defvar sq-state 'nil "In single-quotes?")
(defvar dq-state 'nil "In double quotes?")
(defun ong-insert-special (c) "Insert special characters, like so:
 s => open/close single quotes
 d => open/close double quotes
 ' => apostrophe
 a => <a href=
 i => <img src=
 & => &amp;
 < => &lt;
 - => mdash
 . => center-dot"
  (interactive "c" "'")
  (cond
   ((= c ?s)
    (if sq-state
	(progn
	  (ucs-insert #x2019)
	  (setq sq-state 'nil))
      (ucs-insert #x2018)
      (setq sq-state 't)))
   ((= c ?d)
    (if dq-state
	(progn
	  (ucs-insert #x201d)
	  (setq dq-state 'nil))
      (ucs-insert #x201c)
      (setq dq-state 't)))
   ((= c ?') (ucs-insert #x2019))
   ((= c ?a) 
    (progn
      (if (> (current-column) 0) (newline-and-indent))
      (insert "<a href=\"\">")
      (backward-char 2)
      ))
   ((= c ?i) 
    (progn
      (if (> (current-column) 0) (newline-and-indent))
      (insert "<img src=\"\" alt=\"\" />")
      (backward-char 11)
      ))
   ((= c ?&) (insert "&amp;"))
   ((= c ?<) (insert "&lt;"))
   ((= c ?-) (ucs-insert #x2014))
   ((= c ?.) (ucs-insert #xb7))))

author · Dad · software · colophon · rights
picture of the day
September 27, 2003
· Technology (77 fragments)
· · Emacs (7 more)
· · XML (135 more)

By .

I am an employee
of Amazon.com, but
the opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.