Recently, in discussion of a design for a comments system, I noted that I wasn’t planning to use a database, and I even allowed my self a little fun sneering at the idea. I got several reasonable-sounding emails from reasonable-sounding people saying “Why on earth wouldn’t you?” Here’s why.

Current Setup · ongoing actually uses a database right now, MySQL to be precise. But not at run-time; it only gets involved when I hit the Publish button. And I don’t really need it; it just lets me push some work down into the SQL engine that I’d otherwise have to write a few lines of code for, as in the fragments on the front page:
SELECT uri FROM Essays ORDER BY updated DESC LIMIT $maxOnHomePage;
Also in randomizing the picture:
SELECT essay, dir, src, alt, UNIX_TIMESTAMP(potd) FROM Pix ORDER BY RAND();

But at run-time, all the data you’re seeing comes out of static files, pulled in either with img or link, and a bit via XMLHttpRequest. This is good. It means that the little Athlon that runs ongoing (and some other sites) can serve 130G/month, 200K transactions/day, and can stand up to a Slashdotting without breathing hard.

Comment Storage · I may yet use a database for comments; but not at the runtime end. Which is to say, the comments themselves will live in flat files, and the per-article comment counts will too, the latter being pulled in by XMLHttpRequest in the obvious way.

I suspect that it’ll be handy to have a database to look things up by IP address as part of the spam-control effort. But really, I won’t do that unless it saves work. You could perfectly easily do per-IP-address data lookup in a little directory tree where the data for 12.110.110.204 lives in a file named ip-info/12/110/110/204. Yeah, it might not run as fast as a MySQL lookup, and I’d bet it definitely wouldn’t be as fast as a BerkeleyDB lookup, but it’ll be pretty damn fast, and it won’t interact toxically with anything else the database might be doing. On the other hand, it’ll probably be less work to just make a database table, and “less work” trumps “runs fast” except when it doesn’t.

Why? · I like the semantics of the Unix filesystem, and I also really like the fact that whether you’re talking ufs, ext3, zfs, or whatever, this is some of the world’s most thoroughly-debugged and battle-hardened code. Also, most modern operating systems are really quite clever at noticing when part of the filesystem is getting hammered and caching the whole thing in memory, so you may never go near a disk.

I’m not religious—I’ve deployed runtime databases where I’ve had to. But there is a psychology out there in our profession,which says: if you have data that you want to store and retrieve, that means you need a database. But sometimes you don’t. And sometimes you come out ahead on one or both of the less-work and runs-fast metrics by not having one.


author · Dad
colophon · rights
picture of the day
July 17, 2006
· Technology (90 fragments)
· · Storage (29 more)
· · Web (396 more)

By .

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.

I’m on Mastodon!