Here’s real news: Alex Osborne of National Library of Australia, also known as @atosborne, whom I first met on #clojure, took the Wide Finder bit between his teeth and has posted a remarkable implementation story: Widefinder 2 with Clojure. Um, 8m4.663s! If you care about any aspect of this stuff you really ought to go read it now. Grab yourself a coffee or whatever first; it’s not short.
[This is part of the Concur.next series.]
I’ll be giving it some more thought, but here are a few early reactions.
I’d always known in an abstract way that lazy-evaluation and (virtually) endless lists were useful things; this is the first time I can really feel in my gut how they can make real-world code easier to write.
Lots of the performance wins come from dipping into Java-land
LinkedBlockingQueue), which is
perfectly OK, but a Clojure purist would probably see those occasions as
maybe highlighting gaps in that language’s coverage.
Alex figured out how to use
BufferedReader to push the
line-breaking down into the Java implementation. I’m wondering if I were to
replicate that trick of dealing out InputReaders to threads, but use my trick
of seeking to block boundaries and patching together line fragments, rather
than finding line-breaks to seek to, whether that might be a performance
Assuming ASCII apparently works in this particular case, but is really
skating close to the edge; we’d like the lessons from the benchmark to be
portable to a lot of general cases. This affects the
InputStreamReader and the line-gobbler.
I don’t think it’s that big a deal. There will be a cost to generalizing this so it doesn’t fall over in the face of Shift-JIS or some such, but with some cleverness, not too horrible.
My version was cleanly bisected into a file-reader to which you could pass an arbitrary user function, and the actual stats computation. Something like this would be necessary to any attempt at generalization.
My initial take is that Alex’s can in fact be refactored to do this, but there may be some level coupling lurking where I haven’t see it yet. Let’s push this one on the stack.
I’d have no heartburn about doing all sorts of sleazy Java/JVM tricks in
the line-reader infrastructure, but it does give me a bit of heartburn to see
AtomicLong in what amounts to application code.
In his coda, Alex reveals that he’s only been using Clojure for a couple of months. I’m awed and humbled; this guy’s got some chops. I think that this chunk of code and annotation can be monumentally useful as a teaching tool. Well, and being able to process big logfiles fast is always a good thing.