Here’s real news: Alex Osborne of National Library of Australia, also known as @atosborne, whom I first met on #clojure, took the Wide Finder bit between his teeth and has posted a remarkable implementation story: Widefinder 2 with Clojure. Um, 8m4.663s! If you care about any aspect of this stuff you really ought to go read it now. Grab yourself a coffee or whatever first; it’s not short.

[This is part of the series.]

I’ll be giving it some more thought, but here are a few early reactions.

  • I’d always known in an abstract way that lazy-evaluation and (virtually) endless lists were useful things; this is the first time I can really feel in my gut how they can make real-world code easier to write.

  • Lots of the performance wins come from dipping into Java-land (AtomicLongs, LinkedBlockingQueue), which is perfectly OK, but a Clojure purist would probably see those occasions as maybe highlighting gaps in that language’s coverage.

  • Alex figured out how to use BufferedReader to push the line-breaking down into the Java implementation. I’m wondering if I were to replicate that trick of dealing out InputReaders to threads, but use my trick of seeking to block boundaries and patching together line fragments, rather than finding line-breaks to seek to, whether that might be a performance win.

  • Assuming ASCII apparently works in this particular case, but is really skating close to the edge; we’d like the lessons from the benchmark to be portable to a lot of general cases. This affects the InputStreamReader and the line-gobbler.

    I don’t think it’s that big a deal. There will be a cost to generalizing this so it doesn’t fall over in the face of Shift-JIS or some such, but with some cleverness, not too horrible.

  • My version was cleanly bisected into a file-reader to which you could pass an arbitrary user function, and the actual stats computation. Something like this would be necessary to any attempt at generalization.

    My initial take is that Alex’s can in fact be refactored to do this, but there may be some level coupling lurking where I haven’t see it yet. Let’s push this one on the stack.

    I’d have no heartburn about doing all sorts of sleazy Java/JVM tricks in the line-reader infrastructure, but it does give me a bit of heartburn to see AtomicLong in what amounts to application code.

In his coda, Alex reveals that he’s only been using Clojure for a couple of months. I’m awed and humbled; this guy’s got some chops. I think that this chunk of code and annotation can be monumentally useful as a teaching tool. Well, and being able to process big logfiles fast is always a good thing.


Comment feed for ongoing:Comments feed

From: Alex Osborne (Dec 16 2009, at 03:09)

I see being able to reuse Java's types as a benefit of Clojure, not something lacking in it. I could have used some other scheme but when there's a tool that fits perfectly why not use it? An atomic counter is what I needed and that's exactly what an AtomicLong provides. I guess you could wrap it in couple of functions so it behaves more like a Clojure type, but aside from the ugly type hints it's really quite easy to use as is, so why bother? Fine-grained atoms can do the same job but they are a more general purpose tool and hence incur a little overhead in exchange for their flexibility. :-)

After all, what is a Clojure atom? Naught but a wrapper around an AtomicReference with a convenience function (swap!) for doing compare and set and a few Clojure additions like metadata and watchers.

I think the lessons learned from the exercise are much more important than my particular code. Thanks for suggesting the problem, Tim -- I really learned a lot from this and encourage everyone to have their own go at it with their own tools of choice. If the tool you pick happens to be Clojure take what Tim and I have discovered and try to do even better. I'm sure there's a lot of room to improve both in performance and in elegance of the solution.


author · Dad · software · colophon · rights
picture of the day
December 15, 2009
· Technology (77 fragments)
· · Concurrency (70 more)

By .

I am an employee
of, but
the opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.