This is the fourteenth progress report from the Wide Finder Project. I still have a bunch of Wide Finders to run, and for the time being I’ll keep trying to run what people send me; we can’t have too much data about this problem. But some conclusions are starting to look unavoidable to me.
First off, you should stop reading this and visit Eigenclass to read Mauricio Fernandez’s Conclusions about Wide Finder, C++, OCaml, JoCaml, Erlang and friends. While I don’t buy his use of the word “interpreted”, I find it hard to argue with his conclusions about languages and implementations (but visit his comments). My conclusions are (I think) fairly language-independent, and are about technical strategies; what works and what doesn’t and cost-benefit trade-offs.
Bypassing the Filesystem ·
This is not a new finding. When
I/O is on your critical path and you’re on a Unix-like system, you quite likely
need to be doing
mmap() rather than
read() if you
can. In many circumstances, the path through the OS’ VM-paging subsystem is
than through most filesystem implementations. Multics, an influence on most
modern operating systems, had only memory-mapped I/O.
The downside to this is that files are a very natural lens to look through
at persistent data. Every language has
gets() or equivalent for a
Also note that the advantages of memory-mapping are overwhelming when you’re doing input, and your program semantics don’t require random update. It would be dangerous to extrapolate too much past that scenario.
I think it’s OK for programmers to want to tell the system “Run this code on each line of this data.” If it’s more efficient to do that in a memory-mapped way, the infrastructure ought to just make it happen.
Parallel the I/O · While Wide Finder is not strictly “I/O-bound”, it’s certainly got a big I/O component. If you want good performance on parallel hardware, you need your I/O concurrent.
Combining memory-mapping and concurrency yielded fairly mind-boggling results. Check those Bonnie numbers; one core on this particular box maxes out around 160M/second of block input, so just straightforwardly block-reading the data would burn over 5.5 seconds; yet the JoCaml version ran in less than two.
So while this is obviously worth doing, the same issue arises; programmers like being able to say “Run this code on each line of this data” and that’s reasonable. So what we want is a really cheap way for them to add "... and I don’t care what order you run ’em in, or if a bunch run at once”.
Bypassing Regular Expressions · The Wide Finder programmers did all sorts of pattern-matching voodoo to dodge the overhead of regular expression processing. I don’t have a good quantitative handle on how much of the global speedup this bought, but I’m not convinced that it’s a good idea.
Regular Expressions are incredibly compact and expressive and in my opinion a good tool that programmers should use to drive the logic of processing text. The articles in Beautiful Code (not just mine) make it obvious that thoughtful developers obsess about regexes, and there’s a reason for that. They also make the case, which I’ve heard repeatedly, that there is plenty of scope for optimization in currently-popular regex engines.
I suspect we may have been pushed over the edge of this slippery slope by Erlang’s lousy default regex engine. And it’ll come down to a trade-off: If that fast JoCaml code could be simplified by going regex at a cost, say, of a factor of two or three in performance, that might be a good trade-off.
Computation · The Wide Finder does require you to accumulate results per unique article; doing this concurrently in parallel is going to require some thought. It’s difficult in that there’s no real distinction between readers and updaters; the data structures are write-only for most of the program’s run.
While this actually didn’t seem like a big deal to the Wide Finder implementors, the computation being done here is about as simple as it gets. So I think this is the tough end of the problem: What kind of a general-purpose computation framework do you give developers doing this kind of work? And to be fair, it’s not one that Wide Finder does much to explore.
Up till a couple of weeks ago, I would probably have argued that the Erlang model was the best I’d seen for cracking that nut. Now I think that that join calculus, as exemplified in JoCaml, deserves a serious look.
Conclusion · I still think that the ideal-world solution looks something like this. But I’ve sure learned a lot about how we might get there from here.