I particularly liked the comments from MenTaLguY, Preston Bannister, and Erik Engbrecht. (Unfortunately, given the test data we have, I don’t see us being able to follow Erik’s “non-ASCII” suggestion).
Of the suggestions on offer at the benchmark page, I prefer the last couple. Session statistics are a little trickier, and I suspect a bit more resistant to a brute-force Map/Reduce approach. And the “normal HTML report” idea is very much in the spirit of Wide Finder 1, only with a bit more computation involved, perhaps enough to keep it from being a pure parallel-I/O benchmark.
Your thoughts? I suppose I’m signed up to build a reference implementation for verifying output.