Last week I attended a Sun “CMT Summit”, where CMT stands for “Chip Multi-Threading”; a roomful of really senior Sun people talking about the next wave of CPUs and what they mean. While much of the content was stuff I can’t talk about, I was left with a powerful feeling that there are some real important issues that the whole IT community needs to start thinking about now. I’ve written about this before, and of the many others who have too, I’m particularly impressed by Chris Rijk’s work. But I think it’s worthwhile to pull all this together into one place and do some calls to action, so here goes. [Ed. Note: Too long and too geeky for most.] [Update: This got slashdotted and I got some really smart feedback, thus this follow-up.]
Where We Are Now · It’s no secret at all that we’re shipping Niagara before too long (pictures here). Niagara has eight cores each with hardware support for four threads; and bear in mind that we’re talking about Niagara 1. I am totally not privy to clock-rate numbers, but I see that Paul Murphy is claiming over on ZDNet that it runs at 1.4GHz.
Whatever the clock rate, multiply it by eight and it’s pretty obvious that this puppy is going to be able to pump through a whole lot of instructions in aggregate.
It’s not just us. Both AMD and Intel are sorta kinda shipping dual-core parts, and just this week AMD was making quad-core noises. I thought the key line in that story was the AMD people talking about “throughput per watt per dollar”; a lot of really smart people all over the industry are deciding that throughput per watt is getting more important by the day, and as for throughput per dollar, that’s not new news.
IBM’s much-touted “Cell” work is highly parallel; having said that, they seem to still be bearing down harder than anyone else on cranking up the clock. I wonder if anyone will hit 5GHz in the foreseeable future, and if so I wonder what kind of cooling-system rocket science will be required to keep the sucker from doubling as a small local nuclear-fusion powerplant?
So, while Sun will probably be the first player slapping big money down on the multithreading horse in the high-stakes CPU race, you still need to pay attention even if you’re not a Sun customer. Because a few years from now, you’re going to need a lot more CPU cycles than you do now, and unless you’re willing to bet on that 5GHz fusion reactor, multithreading is how you’re probably going to get them.
What Scales and What Doesn’t? · At one point during the CMT summit, I stuck my hand up and asked: is there anything that in principle doesn’t scale with multithreading? There wasn’t a lot that leapt to the minds’ eyes, except for compiler code. (Bear in mind that while an individual compile doesn’t parallelize that well, what make and Ant do can be, and has been.)
Now of course, the room was full of Sun infrastructure weenies, so if there’s something terribly obvious in records management or airline reservations or payroll processing that doesn’t parallelize, we might not know about it. Having said that, it’s fair to conclude that multithreading will help with a pretty fair proportion of the things that computers do.
And of course there are lots of workloads where multithreading is already known to work beautifully, and that includes a whole lot of Web workloads and other server-side apps.
The Programmer Community · The conversation at the summit spent quite a bit of time on what we need to do to help the developer community get the most out of these weird chips; the days of putting everything in a big loop and counting on the clock rate to make your program run faster every year are so, so, over.
First off, we agreed pretty quickly that there isn’t one “developer community”; there are infrastructure developers who totally have to think about threading and concurrency, and there are application developers who would totally rather not.
Which led to a real interesting discussion: are mainstream enterprise programmers, who today live somewhere in the spectrum from Visual Basic to J2EE, ever going to do concurrent programming, consciously? The conventional wisdom is that they’re just not up to it, but as Graham Hamilton pointed out, there was a long period when the bleeding edge knew all about Object-Orientation and garbage collection and so on, but despaired of the mainstream developer ever getting there. But then the mainstream moved, and pretty quickly too. So I don’t think any of us are comfortable with asserting that “The mainstream will never grok concurrency.” Maybe it’s just the tools that are missing?
At this point the Erlang community will jump up and down and shout “We have the answer!” Maybe, but I’m dubious: if I understand Erlang correctly, it abjures the use of global data, which simplifies the problems immensely. I’ve done a lot of concurrent work, and my biggest programming wins have been all about a bunch of threads running around a big shared data structure.
Java Basics · Java and CMT are a really good fit for each other. For people who have to actually deal with threads and locks and that kind of stuff, Java provides the best programming infrastructure I’ve ever used. This doesn’t mean it’s easy, but it is tractable. (Can I assume that .NET, since it came after Java, learned the lessons and is also decent in this respect?)
At a higher level, people in J2EE-land live in a world of containers of various kinds, and these things are all thread-savvy because they’ve been carefully built that way by experts. So a lot of the big enterprise apps should be CMT-turbocharged just fine, for free.
Where are the Problems? · The most useful part of the summit was identifying the places where the industry as a whole and Sun in particular need to get to work to get ready for ubiquitous CMT. Because we’re facing some real problems.
Problem: Legacy Apps · You’d be surprised how many cycles the world’s Sun boxes spend running decades-old FORTRAN, COBOL, C, and C++ code in monster legacy apps that work just fine and aren’t getting thrown away any time soon. There aren’t enough people and time in the world to re-write these suckers, plus it took person-centuries in the first place to make them correct.
Obviously it’s not just Sun, I bet every kind of computer you can think of carries its share of this kind of good old code. I guarantee that whoever wrote that code wasn’t thinking about threads or concurrency or lock-free algorithms or any of that stuff. So if we’re going to get some real CMT juice out of these things, it’s going to have to be done automatically down in the infrastructure. I’d think the legacy-language compiler teams have lots of opportunities for innovation in an area where you might not have expected it.
Problem: Observability · One of the lessons of Solaris 10 is that being able to tell what your system is doing is a really big deal. Of all the Solaris goodies, DTrace has been the big attention-getter. If we drop a big complicated app onto a CMT box and it runs real fast, that’s good; but what if it doesn’t? We’re going to need DTrace or equivalent right up through all the levels of the application stack, and building that’s going to be a big job.
Problem: Java Mutexes · The standard APIs that came with the first few versions of Java were thread safe; some might say fanatically, obsessively, thread-safe. Stories abound of I/O calls that plunge down through six layers of stack, with each layer posting a mutex on the way; and venerable standbys like StringBuffer and Vector are mutexed-to-the-max. That means if your app is running on next year’s hot chip with a couple of dozen threads, if you’ve got a routine that’s doing a lot of string-appending or vector-loading, only one thread is gonna be in there at a time.
One thing the Java people need to do is put big loud blinking messages in all that Javadoc saying Using this class may impair performance in multi-threaded environments! You can drop in ArrayList for Vector and StringBuilder for StringBuffer. Hey, I just noted that the StringBuffer Javadoc does have such a warning; good stuff, but we need to be doing more evangelism on this front.
On the other hand, those mutexes were there for a reason. Nobody’s saying “Ignore thread-safety” but rather “Thread-safety is expensive, don’t do it unless you need to.”
Problem: LAMP · An increasing proportion of enterprise computing is being done not in J2EE nor WebSphere nor .NET, but in PHP and Python and MySQL and this or that Apache module. While I’m a big fan of dynamic languages, when it comes to parallelism they’re pretty primitive compared to Java. So that community is going to need to put some cycles into becoming CMT-friendlier, and they’re starting from behind.
With the exception of Apache, which has been thread-savvy and sensibly concurrent for a long time. I’m pretty intimate with the guts of Apache, and based only on what we’ve said publicly about Niagara, here’s a fearless prediction: a workload that is Apache-dominated is going to run like a bat out of hell on that kind of box. Wait and see. Having said that, a lot of the Apache world is still running 1.3, Apache 2 changed the process/threading model quite a lot, so I suspect there’s some useful work to be done (probably at the APR level) in making sure it takes advantage of what the hardware can do.
Problem: Testing and Debugging · I am right now, in the Zeppelin context, grinding away on a highly concurrent multi-threaded application. Debugging it is a complete mindfuck, and I’m spending too much time debugging it because I have no idea how to write the unit tests. Consider a method that gets a network request for more resources, discovers which other computers in the cluster are advertising cycles to spare, pings them to see if they’re really there, asks them to handle the request, and reports back to the requester; how do you unit-test that? I have no idea.
This is hard low-level Computer Science and we in the industry trenches could sure use some help from the researchers; are the researchers looking in this direction?
Problem: How Many Is Enough? · Right at the moment, CMT is the low-hanging fruit in CPU performance; it’s a lot cheaper and more tractable (and power-efficient) to double the number of threads than to double the clock rate. But this trend can only go on so long; my intuition is that most modern server workloads will have no trouble using 32 hardware threads. How about 64? How about a thousand? At some point the return on investment gets lousy, and we’re going to have to go back to grinding away at the clock rate, or whatever the next trick is that we haven’t thought of yet.
Conclusion · I’d like to end on a positive note, because actually I’m pretty optimistic: I think that we’re going to get a few years’ good mileage out of cranking up the parallelism, and enough benefits will fall out of the architecture for free to make it worthwhile. I also agree with Paul Murphy (see the the link above) that these chips are going to be well-suited for laptops. I’m currently sitting in front of a 1.25GHz PowerPC; it’s got an Altivec for graphics, but everything else is single-threaded. Pretty well everything runs pretty well fast enough, except maybe PhotoShop and video processing. And those things are already parallelized.
So, given that CMT chips use less watts per unit of computing, why aren’t they being designed into the next generation of laptops? If I were a hot young EE looking for an opportunity, I’d be thinking startup.