Flying over the Atlantic, I read all eight parts of Chris Rijk’s Thread Level Parallelism Design Decisions, and I wish a few more software geeks would go and read it. Herewith a few notes on software design in the era of Thread Level Parallelism.
The Basic Idea · For any software-heads who haven’t been following this whole multi-core thing, the idea is pretty simple. Moore’s law may still be in effect, but it’s getting harder and harder to make the silicon go faster and faster as the clock gets into the 4GHz range. So instead of trying to build a 10GHz chip, we build a 1GHz chip with ten processors (they usually say “cores”) on it.
Sun’s Niagara processor is likely to be first chip with seriously-cranked TLP to ship,and it’s running in the labs today. (That 10×1GHz figure isn’t about Niagara, I don’t know its numbers.) I can’t imagine that Intel, AMD, and IBM will be that far behind. Thus, it behooves us on the software side to do some getting ready. There are a few approaches we could take:
Let the OS Do It · That is to say, application code ignores the problem, just runs the way it’s always run, and the OS takes care of getting mileage out of the TLP. Consider a big server of the future: it probably has multiple CPU chips, each of which has multiple cores, each of which can run multiple threads. If you have four threads running, in principle one core might be able to run all four, but things will go better if they each get their own. Also, if threads are sharing memory, that’s going to go better the closer they are together. And speaking of memory, the problem of managing the memory-to-CPU traffic is devilishly complex in this kind of a box. It’s obvious that part of the problem space belongs to the OS, not application code.
Modern Unixes, in particular Solaris, already have some basic competence here, since boxes with dozens or hundreds of processors are fairly commonplace in Fortune 500 server rooms; but the new chips are going to require lots more OS engineering. Linux is not as far along the technology curve, so that community has even more work in front of it; if I were a twenty-something kernel hacker I’d be itching to get my hands on one of these things. While I’m not a Linux kernel hacker, I’ve spent some time in there and my intuition is that some fairly serious refactoring may be in order.
Anyhow, for a lot of business applications, ignoring the TLP is entirely appropriate; these applications spend almost all of their time waiting for the database or an HTTP message or the user to hit a keystroke or something; it will be straightforward for the OS to take a bunch of these applications running in parallel and deal out the threads and probably get decent results even with a fairly simple approach.
Infrastructure Software · The previous paragraph mentioned databases and webservers. That class of software is already reasonably thread-savvy, and I think it’s going to have to get even more so. I’m speculating when I say that about databases, since I don’t know what’s down in the bowels of Oracle, MySQL, or any of the other big-name databases. But I do know that database vendors tend to be fanatically obsessive about performance (my kinda guys) and I’m sure they’ve worked hard at extracting mileage from threads; and they’ve got lots more work in front of them.
I do know, pretty well, what’s in the bowels of Apache, and anyone who’s done engineering in that space knows that web-servers see about as thread-intensive a workload as you’ll ever encounter. Apache 2.0 has been engineered—very cleverly—to run in a bunch of processes with a few threads per process, or in a few processes with a bunch of threads per. Code that runs close to the server (application servers and so on) can choose to be aware of this or not. Increasingly, I suspect that high-performance code is going to have to be aware of this at all times. And I suspect that Apache itself is going to need some adaptation after we’ve gained some hands on experience with these high-TLP chips under heavy loads. Once again, this should be fun (I suppose there are those who would not share my perception of what is and isn’t “fun”).
At the Application Level · Some applications are going to want to be TLP-aware. For example, this 1.25GHz PowerBook I’m typing on is ridiculously fast for running text editors and Perl scripts and NetBeans and that kind of stuff; but irritatingly slow at heavy PhotoShop lifting and video rendering and so on. Almost by definition, the slow media-centric apps should be able to deploy parallelism to good effect. But this deployment work is not for the faint of heart, because thread-aware programming is, well, hard.
Of the languages I’ve worked with, Java is overwhelmingly the best-suited for developing parallel multi-threaded code. The thread primitives are built-in, quite cleverly, and a lot of the concurrency problems just got more tractable with the new code in the 1.5 release.
This doesn’t mean it’s easy. My Zeppelin project has client-side and server-side code, and both sides are highly parallel; it has been a complete fucking nightmare to debug, and I claim to be good at debugging. If we’re going to empower application programmers to get the most out of the high-TLP chips, we need big advances in development and debugging technology.
I think the key thing isn’t so much better debugging technology as better testing technology. Given JUnit or equivalent, I’m pretty confident that I can pull together a good set of unit tests for just about any conventional single-threaded application. But when it gets parallel, there’s a problem in that I don’t have a general mental framework for how to build a test suite. Once we figure out some of design patterns, there are grounds for hope that we can do some tooling around it for testing and debugging.
But the bottom line is, for application programmers, don’t get into this space unless you’re pretty sure you know what you’re doing and you’re willing to wrestle with some seriously difficult debugging.
That’s the bad news. The good news is that computers are going to feel a lot faster, given some support from the low-level software.