In preparation for my
presentation at ApacheCon next week, I’ve been running some performance tests
with the help of a gaggle of client machines rustled up by some good people in
the Sun engineering-lab group. The first numbers are trickling in, and I’m a
bit at a loss both on what to measure and how to evaluate the results.
Is 180 POSTs/second on a T2000 good?
[Update: Make that 320/second; have some better data presented in a table.]
What mod_atom Does · When you POST an Atom Entry, mod_atom parses it with Expat, finds a home for it in the filesystem, munges the XML slightly to make sure the dates and ID are right, persists it out with Genx, repeats the exercise to make an HTML entry, touches a couple of empty flag files, and sends the entry down the pipe to the client. It doesn’t regenerate the collection files or public-facing feeds, that’s what the flag files are for. When you GET a collection or feed, mod_atom checks to see if there’s a new flag file and regenerates it on demand.
mod_atom does essentially no mutexing, with the single exception of the scenario where you’re doing a PUT on an existing resource and you have to lock down to compute the ETag and check the If-Match header. All the concurrency is pushed down into the filesystem.
mod_atom is a module that gets loaded into Apache
2.2.<some-recent-build>, no particular SPARC or Solaris optimizations aside
./configure gets me.
Initial Tests ·
I have some shell and Ruby scripts that use the
Ape codebase to synthesize entries
full of random selections from
/usr/dict/words and shoot them at
an alleged AtomPub server as fast as possible. Well, as fast as possible for
The testing framework runs on N machines; on each, it fires off ten
subprocesses, each of which creates a new publication and shoots a hundred
entries into it; for a total of a thousand posts to ten different pubs.
HTML-creation is enabled, so each POST requires the creation of three files;
.html, and a book-keeping timestamp
The Atom Entries being posted average a little over 600 bytes in length.
Server · It’s the same box I set up to run the Wide Finder 2 work; an 8-core 32-thread T2000 with 32G of RAM and an unexotic disk setup. Remember, each individual thread is pretty slow, but there are lots and lots of ’em.
Early Results · [Update: This whole section replaced after I did some re-engineering brought on by bug-fixing brought on by interoperating with MarsEdit.]
In the following table, the seconds are elapsed wall-clock time.
This is interesting. It’s very pleasing that as you add parallel client loads, the server pretty well just soaks it up with massively sublinear slowdown. Given that I haven’t really profiled this thing yet nor have I maxed the server out, the eventual throughput on this class of box for this particular benchmark mode is probably way north of the 320 you see here.
However, it’s not quite as good as it looks. Judging by the output of
vmstat and friends, in that final 5-clients run, the idle time
was up because something in the system was beginning to thrash a little. Mind
you, it didn’t seem to hurt the throughput.
Considerations · The nature of the load a ModAtom server might see is really horribly complex to describe. I can think of the following ways to characterize it, all more or less orthogonal to each other:
The total arrival rate of the transactions.
The arrival rate on a per-publication basis; i.e. twenty transactions a second in total run against one publication, or a dozen, or a hundred.
The relative proportion of GET, POST, PUT, and DELETE requests.
The proportion of GETs that are for collections or feeds as opposed to entries, and the proportion of those that are parametrized to support feed paging.
The average size of the incoming Atom Entries.
The proportion of XHTML, HTML, and plain-text in incoming ATOM Entries.
The average size of media objects that are POSTed.
The relative proportion of Atom Entries and media objects that are posted.
The proportion of errors (malformed transactions, AtomPub protocol errors) in the input stream.
The set of configuration parameters applied to the underlying Apache server. (People who’ve been there are shaking their heads in sympathy at this point.)
What Next? · Well, I started with POSTs because my intuition about the code told me that POSTing an Atom Entry was the most complex among the code paths that you might reasonably expect to experience high request volume. I’m a little disappointed with 180 requests/second, but I’m also nonplused at the server spending nearly half its time in kernel mode. Clearly some profiling is called for, and I’d sort of expect to find some low-hanging fruit out there.
I guess the next step is to start building out a matrix containing a reasonable range of values for each of the items in the list above, and characterizing the performance at as many sane points as I can.
Questions? · If anybody’s made it this far, I’d welcome input on which dimensions of the performance space would be most interesting to exercise.
More broadly, I’m interested in what a “good” number would be. 180 new POSTs per second just doesn’t seem that great to me, especially considering that this is filesystem-backed and there’s no database getting in the way. Of course, it’s perfectly possible that my intuition about the relative performance of filesystems and databases is completely wrong. I’m starting to think of Drizzle and CouchDB... of course, no profiling yet.
Oh, and by the way, I am as a side-effect building a framework for performance testing of AtomPub server implementations.