This is the seventh progress report from the Wide Finder Project, in which I report on the work of others and turn this into kind of a contest. I’ll be back later with one more whack at Erlang, and then probably move on to other Wide Finder options. [Update: If you’re working on this but I missed you, sorry; holler at me and I’ll make sure you get covered.]
Pi Calculus · Since I started this thing out with a little Ruby program, let’s start this report with work from Assaf Arkin. In Pi-cture this: Pi-calculus, Ruby and WideFinder, he introduces the π-calculus and a Ruby library that implements it.
He follows up with WideFinder, Ruby Pic, and Scaling Up, Out and Away, running the code and actually measuring the results.
Lisp · Well, it was only a matter of time. See wide finder in lisp by Irate Nate. I note that the Lisp code, while not as compact as the Ruby, is a lot more compact than the Erlang.
C++ · The Machine Elf writes Wide finder, parallelism and languages; his C++ implementation is actually shorter than Steve Vinoski’s first cut in Erlang, which I find amusing.
He runs his test on ancient creaking SPARC boxes, and brings MPI into play, which seems reasonable enough. He struggles with the fact that the Wide Finder can be (but need not be, I argue) I/O-bound. Still, I recommend reading this piece, in particular the closing Conclusion paragraph, which includes some first-rate lateral thinking about the specifics of this problem.
Parrot · At linux devcenter, Kevin Farnham writes Open Source Thoughts: Parrot and Multicore, which doesn’t actually reference Wide Finder at all, but is clearly thinking about the same space. He proposes that the Parrot VM (specifically aimed at being the substrate for Perl 6, but with loftier dreams) might be a good basis for progress in the multi-core world.
Python · Santiago Gala contributes Python, Erlang, Map/Reduce. His Python code is one line shorter than my Ruby version and, he claims, more elegant. Check it out and make up your own mind. He has no performance data yet.
Erlang · Over at Caoyuan’s blog, there’s Tim Bray's Erlang Exercise on Large Dataset Processing, which provides a huge glob of Erlang code and better numbers than I’ve got so far. Also he points to a contribution from Per Gustafsson with even better numbers.
But for my money, the most interesting Wide Finder Erlang work is being done by Steve Vinoski. His second report from the coal-face is More File Processing with Erlang. It’s long and dense and meaty and worth reading.
The Contest · They tell me I’ll soon be able to get my hands on a pre-release T2-based server. I will make a serious effort to run as many of these Wide Finders as practicable on that machine with lots of gigabytes of real data, and will report on results. Bear in mind that the T2 has a top cycle speed of only 1.4Ghz, plus remarkable I/O and memory subsystems, plus 8 cores with 64 hardware threads. My expectation is that the Wide Finder won’t be I/O limited on this box, and all these parallelism stratagems people have been grinding away at will pay fruit.
But there’s only one way to find out.
And it occurs to me that anyone who can make this puppy fly on the T2 will have done Sun a real favor, and deserves a big thank-you; I’ll see what I can arrange.