Identifiers

Mark Pilgrim has a good piece today on how to choose permanent identifiers. I’m going to pay him the compliment of disagreeing at length.

First, a reference issue: Mark points out that Atom IDs have to be URIs, as specified in RFC2396. In the unlikely event that you’re going to follow that pointer to read the RFC, don’t; instead, go read the almost-compete revision, which is much easier to read, understand, and implement.

The Problem With “tag:” · Mark recommends using tag: URIs as IDs, and his reasoning is sound, but there’s one flaw: that URI scheme hasn’t been registered, and in fact may not be; the IETF URI scheme registration process is widely viewed as dysfunctional. The use of unregistered schemes is officially frowned on; the reasons may not concern you, but before you take this leap, you might want to check them out.

There is an alternative that’s worth looking at, namely NewsML URNs (RFC3085), which give anyone who owns a domain name the right to mint URNs. Using this, tag:diveintomark.org,2004-05-27:/archives/2004/05/27/howto-atom-linkblog could become urn:newsml:diveintomark.org:20040527:howto-atom-linkblog:1. Nice and official, and just as unreadable too!

The problem is that RFC3085 says that NewsML URNs are supposed to be used to identify NewsItems, which are quite different from Atom feeds. This is unlikely to cause any problems in practice, but is worrying.

So in fact there doesn’t seem to be a real good simple play-by-the-rules way to create unique identifiers starting with domain names. Except, of course, for good old http:-class URIs.

Why Not Permalinks? · My real issue here is with the contention that you shouldn’t use the URI of your article as a unique ID for it. I think that in fact you should. While the technology doesn’t care in the slightest whether they are the same, choosing a different ID encourages negative, Web-unfriendly behavior.

Mark offers a few reasons for having a different ID, but the one I want to focus in on is “An entry ID should never change, even if the permalink changes.” And since I think changing permalinks is profoundly bad behavior, engineering your system to allow for this feels like designing an IP-spoofer into an email system in case you feel like spamming.

I’m not the only one who thinks that changing permalinks is a bad idea; it breaks caches and bookmarks and PageRanks and generally sucks. And to be fair, it’s clear that Mark isn’t recommending gratuitous address changing; he’s just prepared to tolerate it as a regular occurrence.

And at some level he’s right; if you think that there’s a good chance your URIs will change, you shouldn’t use them for IDs. But, if you think that, you should also bloody well be looking for better software or hosting or whatever.

I’m perfectly conscious that many software providers apparently don’t care about arbitrarily changing URIs, but dammit they’re wrong and we shouldn’t be encouraging their bad behavior. If you think that what you’re publishing has any weight or import or lasting value—and why would you go to the work if you didn’t?—you ought to take the trouble to put it at a domain you control, and to choose software that isn’t going to move things around.

For example, Mark should stop automatically generating URIs based on his article titles, because titles change; I change titles here at ongoing all the time, but I’ve never changed a URI and I never will.

He also argues against permalinks-as-IDs on the grounds that they can create confusion, which doesn’t seem like a big deal but that’s a matter of opinion, and to support the “same” entry showing up on different sites. Sorry, I think that if it’s the same entry you ought to do the Web the courtesy of telling it that, thus making search engines and so on work better, and point to it from multiple places with the same pointer.

So, if you’ve got software that isn’t broken, and you have some commitment to making the Web work better, go ahead and use your URIs as ID’s. Otherwise, follow Mark’s advice.

ongoing

What this is ·

Truth · Biz · Tech

author · Dad
colophon · rights

May 28, 2004
· Technology (90 fragments)
· · Atom (91 more)

By Tim Bray.

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.

I’m on Mastodon!