Over the last couple of years, I’ve written lots here about concurrency, and its issues for developers. The proximate cause is that my employer is pushing the many-core CPU envelope harder than anyone. This last week there’s been an outburst of discussion: from Nat Torkington, David Heinemeier Hansson, and Phillip Toland. They’re all worth reading, and the problem isn’t going away. But I’m worrying about it less all the time.

Here’s the thing: I’m looking at the class of problems that are best addressed by real actual shared-address-space threads, and it’s looking smaller and smaller and smaller.

That’s the world you have to live in, within the operating system or the Web server or the database kernel... but for applications? These days, you’re way better scaling out not up. Shared memory? Hell, shared nothing if you can get away with it.

It isn’t free. You have to pay a per-process memory tax, and worry about saturating the network and (de)serializing the data and all sorts of other stuff. But for now, it seems you still win.

Computers are still going to look more and more like those many-core SPARCs we’re shipping, but at the application level, that should be a red herring.

Which is why there’s a new Erlang book and strange words like “Haskell” and “Hadoop” echo in geeky back-rooms. In an ideal world you have a message-passing layer that figures how to move ’em around using whatever combination of threads and processes and shared memory and bits-on-wires gets the job done.

(This also reinforces my opinion that message-queuing systems will inevitably end up at the center of everything serious; but I digress.)

It will doubtless be pointed out that on the client, threads will remain in application programmers’ faces for the foreseeable future. Which I see as another argument for doing everything you can through the Web.


Comment feed for ongoing:Comments feed

From: Pierre Phaneuf (Jun 08 2007, at 04:22)

That's far from being a new development. See question #6 of http://slashdot.org/interviews/00/07/20/1440204.shtml for example.

One new thing with multicore processors compared to SMP is that they share cache between the cores, so some of the inter-processor cache invalidations issues aren't as bad. Having the cache shared has both good and bad sides: there's less of it per core than in a classic SMP system, but an execution context can move from one core to the other without suffering from a cold cache. It's one more layer in the hierarchy (NUMA node -> processor -> core -> hyperthreading).

Of note, Ruby's heavy reliance on block syntax would probably make it easy to make an Erlang-style message-passing framework for it, which should look natural enough (pass a block to a "wait for a message" function). Perl's closures would work just as well, but they're not as familiar a sight.

But inside a normal application, shared-VM threads are quite silly, and rarely truly necessary.


From: Stelios Sfakianakis (Jun 08 2007, at 07:49)

"It will doubtless be pointed out that on the client, threads will remain in application programmers’ faces for the foreseeable future"

Why is that? If 'share-nothing' and 'message passing' paradigms are good for the overloaded servers and also provide an easier way of building software I don't see any reason not to use them in the desktop as well.



From: Jim McCoy (Jun 08 2007, at 09:24)


I would suggest that there is a lot more to Erlang's message-passing system than can be handled with simple blocks in Ruby. One of the big advantages that Erlang offers in this area is immutable variables; there is no serialization/marshalling cost for passing these messages around (and even passing them over the wire.) Ruby, Python, et al. will all need to deal with this issue and it is going to have a serious performance cost. My interest is more from the Python side than Ruby, but I have been watching people sketch out various "erlang-style" message-passing schemes in both languages and everyone seems to run into this particular performance wall. When the code is running in a single interpreter it is easy to add message-passing syntactical sugar, but when you start crossing CPU/system boundaries things seem to get ugly very quickly...


From: Pierre Phaneuf (Jun 08 2007, at 18:59)

Of course, Jim is surely right, this was more of a "could be done quickly in Perl, but would be even nicer in Ruby (like many things!)", and I'm no Erlang expert. The devil's in the details and all...

But basically, one of the big thing with Erlang, from my point of view, is that there's no blocking operation that isn't done through a message, so in the end, the state machine HAS to be written, there's no going around it with cheap cop-outs. And then, of course, they put in all the other things that make it easy to write those state machines.

It's like you can write object-oriented code in C, it's just more of a pain in the ass, having to call destructors manually, implement your virtual method tables by hand, etc. You can also do the lightweight process thing in a "less favourable" language. In fact, the lighttpd people, for example, do so, in C.

But at some point, you write one too many virtual method table in C, and realize you could just use C++ and be over with it, just like people were pushing data and their instruction pointer unto a stack by hand before jumping to some other code, so it could return, and then realized they could just use a language with subroutines/functions. The same might very well happen with message-passing, and people will switch over to some language like Erlang where they don't have to do the drudge work themselves (saving themselves the extra bugs, while they're there).


From: Kirit Sælensminde (Jun 09 2007, at 02:05)

"Of note, Ruby's heavy reliance on block syntax would probably make it easy to make an Erlang-style message-passing framework for it, which should look natural enough (pass a block to a "wait for a message" function)."

The problem with basing off closures is that you need to alter the runtime. If you use message passing directly then you can avoid language changes and implement the threading as part of an external library.

I've implemented Mahlee <http://www.kirit.com/Introducing%20Mahlee%E2%84%A2> that does this for JavaScript. It is a host extension that allows JavaScript objects to be created in other threads. Using these objects is syntactically the same as using any other JavaScript object with the restriction that parameters will be marshalled via JSON. The return is a future which has the only blocking call, result().

Clearly the marshalling is a performance hit, but on multi-core machines the ability to use them all more than makes up for it.

The same model should work for other languages. Those with good introspection (like JavaScript) will be easier to deal with without requiring changes to the core language implementation.


From: Laurent Szyster (Jun 09 2007, at 14:33)

Erlang built-in concurrency features was a requirement of Ericson telecom applications: to distribute an application state in a network of computer systems at the highest speed possible.

This was a very narrow field of application.

Is it wider now?

The vast majority of network applications have been the zillions of public and private web services developped with LAMPs (Linux, Apache, MySQL and Perl, PHP or Python) that leverage tons of synchronous API implemented in fast C libraries.

And those API are fundamentally at odd with Erlang's model.

So, are we stuck with threads on network peers?

For CPU intensive processes and synchronous system API, yes and for a while more. But not for network I/O. Actually, we've never been and most high performance network servers are state machines built on a few good asynchronous API.

The select or poll interfaces for non-blocking sockets are old and widely implemented POSIX standards, epoll and kevent are scaled up implementations of the same principle. Since mixing asynchronous socket I/O and process signals is nothing new either (see http://cr.yp.to/docs/selfpipe.html), it has been practically possible to have the good sides of worlds for a while now (on POSIX systems since 1990 according to DJB).

In Firefox's web client or Caucho's Resin J2EE server for instance, network I/O is handled free of contention and latency cost but CPU intensive applications can leverage synchronous API and the system support for native threads/processes.

Will "practicality beats purity" once more?

I bet it does.


From: Adrian Cockcroft (Jun 12 2007, at 01:19)

A few years ago I noticed that the web services orchestration and choreography standards looked familiar to me because they were based on pi-calculus, which is based on CSP, and I spent much of the 1980s programming the same CSP-like constructs in Occam. When I came up with an idea that involved a complex Skype based peer to peer web services protocol I used the latest version of Occam-pi to build a very efficient simulator that can run something like a million threads in one process on my laptop. It worked quite well, I wrote a paper, and I just posted the slide deck on my blog. I still think Occam is a great language for learning concurrency, and I wish they still taught it...

Cheers Adrian



author · Dad
colophon · rights
picture of the day
June 07, 2007
· Technology (90 fragments)
· · Concurrency (75 more)

By .

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.

I’m on Mastodon!