Mark Pilgrim has a good piece today on how to choose permanent identifiers. I’m going to pay him the compliment of disagreeing at length.
First, a reference issue: Mark points out that Atom IDs have to be URIs, as specified in RFC2396. In the unlikely event that you’re going to follow that pointer to read the RFC, don’t; instead, go read the almost-compete revision, which is much easier to read, understand, and implement.
The Problem With “tag:” ·
Mark recommends using
tag: URIs as IDs, and his
reasoning is sound, but there’s one flaw: that URI scheme
registered, and in fact may not be; the IETF URI scheme registration
process is widely viewed as dysfunctional.
The use of unregistered schemes is
on; the reasons may not concern you, but before you take this leap, you
might want to check them out.
There is an alternative that’s worth looking at, namely
NewsML URNs (RFC3085), which
give anyone who owns a domain name the right to mint URNs.
Nice and official, and just as unreadable too!
The problem is that RFC3085 says that NewsML URNs are supposed to be used to identify NewsItems, which are quite different from Atom feeds. This is unlikely to cause any problems in practice, but is worrying.
So in fact there doesn’t seem to be a real good simple
to create unique identifiers starting with domain names.
Except, of course, for good old
Why Not Permalinks? · My real issue here is with the contention that you shouldn’t use the URI of your article as a unique ID for it. I think that in fact you should. While the technology doesn’t care in the slightest whether they are the same, choosing a different ID encourages negative, Web-unfriendly behavior.
Mark offers a few reasons for having a different ID, but the one I want to focus in on is “An entry ID should never change, even if the permalink changes.” And since I think changing permalinks is profoundly bad behavior, engineering your system to allow for this feels like designing an IP-spoofer into an email system in case you feel like spamming.
I’m not the only one who thinks that changing permalinks is a bad idea; it breaks caches and bookmarks and PageRanks and generally sucks. And to be fair, it’s clear that Mark isn’t recommending gratuitous address changing; he’s just prepared to tolerate it as a regular occurrence.
And at some level he’s right; if you think that there’s a good chance your URIs will change, you shouldn’t use them for IDs. But, if you think that, you should also bloody well be looking for better software or hosting or whatever.
I’m perfectly conscious that many software providers apparently don’t care about arbitrarily changing URIs, but dammit they’re wrong and we shouldn’t be encouraging their bad behavior. If you think that what you’re publishing has any weight or import or lasting value—and why would you go to the work if you didn’t?—you ought to take the trouble to put it at a domain you control, and to choose software that isn’t going to move things around.
For example, Mark should stop automatically generating URIs based on his article titles, because titles change; I change titles here at ongoing all the time, but I’ve never changed a URI and I never will.
He also argues against permalinks-as-IDs on the grounds that they can create confusion, which doesn’t seem like a big deal but that’s a matter of opinion, and to support the “same” entry showing up on different sites. Sorry, I think that if it’s the same entry you ought to do the Web the courtesy of telling it that, thus making search engines and so on work better, and point to it from multiple places with the same pointer.
So, if you’ve got software that isn’t broken, and you have some commitment to making the Web work better, go ahead and use your URIs as ID’s. Otherwise, follow Mark’s advice.