[This is part of the Wide Finder 2 series.] This should be the final entry, after a couple of years of silence. The results can be read here, for as long as that address keeps working. I’m glad I launched that project, and there is follow-on news; taking effect today in fact.
Conclusions · This was a lot of work to demonstrate two simple findings that most of us already believed:
It is possible to achieve remarkable throughput on highly parallel hardware, even for boring old-style I/O-heavy batch-processing problems.
It remains unacceptably hard to achieve such performance. Whether you measure it by the number of lines of code, the obscurity of the languages and libraries you have to learn, or the number of bugs you have to fight, it’s still too difficult to write concurrent application code.
Smiling · If you see Wide Finder 2 as a horse race, there’s a clear winner: Dmitry Vyukov. More than one software-savvy geezer, when I described the kind of throughput he was squeezing out of that very modest SPARC box, looked incredulous. His code pulled some useful statistics out of the test data in less elapsed time than a lot of people it would take that processor to read it off those disks.
Mind you, he also has won a couple of parallel-programming contests put on by Intel, and he’s running a lovely series of writeups on the issues over at 1024Cores, including (ahem) Wide Finder 2, from which I quote: “The program contains 17 times more LOC than the initial Ruby program, and is 479 times faster.”
When Dmitry wrote me about that post, he also mentioned that he was interviewing with Google. I immediately tracked down the recruiter and told her not to let this one go. I get the impression that they’d already noticed, and Dmitry is starting work at Google today.
I’m pretty sure he’s not going to be able to talk much about what he works on, so I hope he keeps 1024Cores going, he’s got plenty to tell the world about how to keep getting us the most out of Moore’s astonishing law.