There are a lot of ingredients that might or might not go into the winning formula that brings concurrent programming to the mainstream. This is a very brief run-through of as many as I can think of.

[This is part of the Concur.next series. At the moment, I think the next few pieces are going to be discussions of some or maybe all of the items in the list. If you’ve published something particularly gripping about one of them or another, shoot me a link.]

I’ll try to update this piece lots; I’m sure people will write in to disagree with my characterization and to argue for the addition or removal of the items from/to the list. Since this is an enumeration rather than an opinion piece, I have quite a bit of hope that it might come to represent community consensus.

In this discussion, I frequently refer to the HECS (Haskell, Erlang, Clojure, Scala) languages. I am not claiming that one of these is the winner or that there aren’t other worthwhile contenders. It’s just that they keep floating to the top and (I think) represent an instructive range of ways to aggregate the features in this laundry list.

Functional Programming · Hereinafter FP, please. The Wikipedia explanation is fairly impenetrable. Erlang is a decent gateway drug and Haskell is definitely the hard stuff.

The proportion of a program which is composed of entirely side-effect-free function calls should by definition be arbitrarily parallelizable. That’s the theory, anyhow. Essentially every modern programming system that claims to address concurrency provides some FP capabilities.

Immutable Data · If your data objects are immutable, you can operate upon them freely and concurrently without fear of corruption or the need for locking. Plus you can use them as interprocess messages without actually copying them, or ship them between physical machines and know that the version you left behind is still current, whatever they do over there. It seems pretty obvious that this is a powerful tool for use in addressing concurrency problems.

There are gradations of immutability. You can have immutable variables and do what feels like changing them if you auto-magically make a timestamped new version while preserving the validity of pointers to the old one. There are all sorts of data-structure tricks you can do, for example an “immutable” array where you append an element, logically producing an entirely new object but in fact re-using the pre-append part of the array.

This is a simple notion but very deep subject.

Processes and Actors · It’s axiomatic that we’re not going to expose threads in the operating-system sense. But it seems that programmers ought to be allowed to see some notion of a sequence of operations that proceeds in parallel with other sequences of operations.

Erlang calls them processes, Scala calls them actors (there’s a big chunk of Computer-science theory at work there), Haskell doesn’t call them anything but the documentation bandies the term “Thread” about.

The way in which this is exposed to the programmer seems to me like a crucial defining characteristic in building a strategy to attack the concurrency problem.

In the context of this series, when I say “process” I’m using it more or less in the Erlang sense, as a sequence of execution with its own (logical) call stack and heap and so on, without asserting that its implementation is based on an OS process or thread, or that it is or isn’t an Actor in the formal sense.

Message Passing · If you’re going to expose processes and avoid global data, you need a way for them to communicate. There is a lot of semantic variations in the way different platforms do interprocess communication, but a superficial look suggests that some shared patterns are emerging; Scala, for example, consciously echoes Erlang in its approach.

Typing · Here we have single greatest religious divide among programmers, and it cuts right across this space. Thank goodness none of the candidates are weakly typed a la Perl. Erlang has unsurprising dynamic typing but no objects. Scala has inferred static typing applied to Java-flavored classes/objects/methods.

Haskell has stronger and more elaborate (static) typing than anything I’ve ever been near; in fact they push a lot of what would normally be considered an application programmer’s work down into the type system. They have things called classes (OK, typeclasses), but they’re a horse of a different color.

It’s not obvious to me that the choice of type system is that important in building the Java of concurrency, but I could easily be wrong on that.

Virtual Machine · Is it an advantage to have your own virtual machine, like Erlang, to be based on another like Clojure and Scala, or just to compile to native code like Haskell? And the JVM in particular brings along a mind-bogglingly huge amount of existing software and libraries you can call out to. Well, except for a large proportion either isn’t optimized for concurrency or just won’t work that way at all.

The right answer here isn’t obvious at all.

Transactional Memory · Since nobody’s ever shipped hardware transactional memory, we’re talking STM here; Clojure in particular makes a big deal of this.

The core idea is sort of like applying ACID database semantics in accessing program variables. The effect is that you can mutate things concurrently where you need to in the context of a transaction; if a collision occurs, the whole transaction is rolled back and inconsistency doesn’t arise.

Tuple Space · The Wikipedia Tuple space article is perfectly OK. I can remember like yesterday back in the Eighties, reading about Linda and thinking this had to be the future.

Maybe so, but it’s sure not the present; I’ve never actually had my hands on a deployed piece of software that relied on a tuple space. And at the moment, I don’t hear anyone claiming this is the way to build the Java of concurrency.

Dataflow · But, like tuple spaces, Dataflow is an idea that looks like it ought to be real useful in building concurrent systems. Concretely, you ought to be able to farm out the recalculation of a big complex spreadsheet to a bunch of processors until you get to the last-step sum computations.

Like tuple spaces, I don’t see anyone trying to build the concurrent future with dataflow techniques.

Reliability · This is only here because of Erlang, which claims to have two design goals: First, to enable concurrent computation, and second, to make such computation reliable in the face of software and hardware errors. In order to accomplish this in a process-based paradigm, it wants to handle all errors simply by blowing up the process where they happened; then there’s a really slick system of local and remote process monitors that should enable you to keep your system on the air in the face of a whole lot of different classes of problems.

The thing is, once you’ve worked with Erlang a little bit, the notion of trying to deliver any concurrent system without those sorts of monitors and failovers and so on begins to be seriously worrying. My personal bet is that whatever we end up with is going to have to have a good story to tell in this space.

Language or Library? · This is a big one. Do we need a whole new platform? A new language on an existing VM? Or maybe even just a set of libraries you can use from existing languages?

You can do massively parallel computing right now today, from FORTRAN forsooth, using MPI. There are more modern approaches including MapReduce and Hadoop.

There are Actor libraries that I know of for Java, Ruby, and probably essentially every other modern language. And nobody’s forcing you to make your variables mutable or to share data between threads. Or even to use threads; in my own Wide Finder project (I, II), there was no real evidence that thread-level concurrency outperformed processes.

These days, given the choice, I prefer to code in either Ruby or Python. I’d love to be able to keep as much of that lightweight dynamic enabling goodness as possible and still drink from the fountain of concurrency.

Distribution · Which is to say, can your concurrent application code run across multiple physically separated computer systems, as opposed to just the cores on one system? If so, is it automatic or under the programmer’s control. Erlang makes this explicit, as a reliability mechanism, but for problems that get really large in scale, this could easily become a gating limit on performance.

What’s Missing? · From this laundry list, I mean. Or what’s really wrong and misleading. Or should any of these be cast into the outer darkness?



Contributions

Comment feed for ongoing:Comments feed

From: Steve Vinoski (Oct 01 2009, at 19:19)

Distribution is missing. It's required at least for reliability, given that you need multiple systems for that, and these days it's required in general for many applications.

[link]

From: Ollie Kottke (Oct 01 2009, at 19:24)

Is Tail Call Optimization (TCO) needed? The functional language Clojure seems to work without it.

[link]

From: Greg (Oct 01 2009, at 20:01)

Haskell (at least the most popular compiler, GHC) doesn't have a "VM", it has a native-code compiler and an interpreter.

[link]

From: Mr. Wobbet (Oct 01 2009, at 20:19)

Have you looked at Apple's Grand Central Dispatch yet? The Snow Leopard review at Ars Technica by John (Jon?) Siracusa goes into what Apple has done (or proposes to do) with extending C/C++/Obj-C with blocks for passing to GCD (which they've also open-sourced).

[link]

From: JH (Oct 01 2009, at 20:25)

"It’s not obvious to me that the choice of type system is that important in building the Java of concurrency"

Simon Peyton-Jones talks with Joe Armstrong a bit in this interview (http://www.infoq.com/interviews/armstrong-peyton-jones-erlang-haskell) about how typing may apply to concurrent programming. The desire being, as far as I can tell, to have a way to specify constraints on the order of (possibly) concurrent operations, e.g. that an API definition can enforce at the type level that you must open a file before reading. Scala is also mentioned as having concurrency control via type support for some data types.

[link]

From: projectshave (Oct 01 2009, at 21:28)

Remove the section on tail calls. It has nothing to do with concurrency, and your description is completely wrong. In fact, most of these categories are irrelevant. Chapter 5 & 6 in this report might help: http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.pdf

Rather than focus on languages, look at the underlying models. You've got message passing, direct control of threads, task-based models, data parallel models, etc. Then there's memory hierarchy issues, multicore vs. distributed computing (1 vs many machines).

[link]

From: Peter Keane (Oct 01 2009, at 21:29)

many thanks! I'm finding this series quite enlightening/interesting.

[link]

From: John Cowan (Oct 01 2009, at 22:03)

Ollie: General tail-calling doesn't work in Clojure because it can only be done on the JVM as a kludge. Nobody would be happier than Rich Hickey if one fine day the JVM provided tail-calls, believe me. (Some other people would be just as happy.)

But Tim, your view of tail-calling isn't a big fat lie at all, it's the big fat truth. The whole message of Scheme, the first FP language, is that "a special combination of GOTO statement with variables you pretend are immutable" just *is* a subroutine call, provided you realize that the proper time to push stack is not when you call a procedure, but when you evaluate a complex argument to the procedure (one that isn't a variable or constant).

For more on this, see Guy Steele's 1977 paper "Debunking the 'Expensive Procedure Call' Myth, or, Procedure Call Implementations Considered Harmful, or, Lambda: The Ultimate GOTO" <http://repository.readscheme.org/ftp/papers/ai-lab-pubs/AIM-443.pdf>.

And as for dataflow, we do dataflow every time we're in the shell and type "|". There is nothing new under the Sun.

[link]

From: carl forde (Oct 01 2009, at 22:37)

Just so you know that there really is an application that uses tuple-spaces, let me introduce you to Knowledge Forum: http://www.knowledgeforum.com/

[link]

From: Zak (Oct 01 2009, at 23:10)

Clojure uses the recur form for tail calls providing a syntactic difference between stack-consuming self-calls and tail calls. Mutually recursive functions aren't quite perfect, but it does have the advantage that the compiler won't let you think you have a tail call when you don't.

Clojure also has primitives for dataflow in the form of watch functions and watcher agents - essentially callbacks for changes to mutable references. Watcher agents, like all Clojure agents get their own threads, making them run in parallel automatically.

[link]

From: Corion (Oct 02 2009, at 00:40)

Perl actually can be seen as quite strongly typed, but it has lots of implicit type conversions.

There is no way in Perl to treat a Scalar (a string, number or reference to something) as an Array or a Hash (to others known as Map or Dictionary).

The implicit conversions between numbers and strings are done by the operators, which is why Perl has distinct operators for numerical addition and string concatenation, and for numerical and string-wise comparison for example. But all of these happen only on Scalars, and every argument will be implicitly converted into a scalar if it isn't already.

[link]

From: JulesLt (Oct 02 2009, at 01:30)

On the VM subject - there's an obvious trend towards shared language VMs - the JVM & Microsoft's CLR being the obvious ones, and the various initiatives (Apple, Google, Adobe) around the LLVM being another - although the LLVM is really below a runtime.

There's also Etoile bringing Smalltalk as a 1st class language to the Gnustep runtime, as well as close to toll-free language bridges like RubyCocoa to consider.

Which has little to do with concurrency, but I think it will influence what is eventually successful.

[link]

From: Ingo Lütkebohle (Oct 02 2009, at 01:43)

Dataflow is widespread both in signal processing and with visual programming languages (I'm thinking of Simulink, LabView and the likes). Several approaches in the complex event processing crowd are also based on dataflow.

In my book, thats a big deal: If there is a community that has a concurrency success story from /way/ back, its signal processing.

Furthermore, close similarity between FP and dataflow has been fairly well established (at least as far as I can tell) since at least the mid-nineties.

What I guess the traditional CS language theory community has a problem with is the somewhat pragmatic (some would say "messy") mixture of "traditional" imperative programming inside the dataflow nodes and the more-functional coordination languages that specifies the interactions between the nodes. However, in my opinion, that is exactly what makes this approach fairly easy to grasp for novices and, additionally, easily integratable with legacy approaches. Something that can, unfortunately, not be said about Erlang nor Haskel (but is a plus with Scala).

[link]

From: Chris (Oct 02 2009, at 03:02)

Agree with Steve Vinoski about distribution, except that I would have called it "place". Just think about the way that people naturally draw blobs and arcs on a whiteboard for different jobs. If you can identify places where computation happens, even if those places are virtual, then the computation can be parallelized. The bonus, to my mind, is that it allows you to think about programming on the Web in a unified fashion.

Occam used to require you to define where to place code, and that was really hard to use, so I don't recommend that approach.

[link]

From: Jacek (Oct 02 2009, at 04:28)

@John Cowan: saying that >>"a special combination of GOTO statement with variables you pretend are immutable" just *is* a subroutine call<< seems to go against the word "subroutine".

Subroutine, as I understand the word, is something you call *and get back from*. Pushing the stack is natural to this interpretation. I believe that tail-call optimization should be viewed as a feature, not as a paradigm. In other words, LISP/Scheme should have had some form of a for-loop, because it would be a natural syntax for expressing *some* kinds of loops.

I remember learning Scheme in college, and while TCO was very cool and eye-opening, some of its forced uses felt weird.

[link]

From: Mark Volkmann (Oct 02 2009, at 05:41)

I don't think persistent data structures are emphasized enough in discussions about FP. In the following, read "->" as "motivates the need for". concurrency -> FP -> immutability -> persistent data structures. The need to create new data structures that represent modifications to existing ones is huge. This is a big advantage that Clojure currently has over other FP languages.

[link]

From: Rich Hickey (Oct 02 2009, at 06:04)

I think there are a few (Erlang?) biases in your overview thus far:

>It’s axiomatic that we’re not going to expose threads in the operating-system sense.

I disagree. Why take threads off the table prematurely? We need to disentangle threads from the problems of locks and shared mutable memory. Thread pools etc may remain important concurrency tools that need not (necessarily) be abstracted away. Threads have a distinct advantage in that people know what they are getting when they have one.

As you've noted, we can already do multi-(OS)-process concurrency today, and have for a long time, with pipes and message queues, and now things like Hadoop, Terracotta etc. So, if there is a concurrency question, it is - in what way are those solutions insufficient?

You seem to be focusing on same-(OS)-process concurrency, which is fine (and the place where those IPC solutions may be insufficient). At that point, it becomes important that any 'process/actor' abstraction still deliver on multiple cores, else people will be disappointed if task-switching 'concurrency' systems utilize a single core only. 'Proceeding in parallel' is not enough.

>If you’re going to expose processes and avoid global data, you need a way for them to communicate.

I think it is imperative to completely stop using the terms 'global data' and 'shared memory' without qualifiers. Shared *immutable* data structures may in fact be critical to high-performance same-process concurrency. Message-passing actor abstractions have nice properties (transparent distribution), but not without tradeoffs for same-(OS)-process use (if truly no sharing of memory, then full copies, and, in all cases, serialized reading with I/O characteristics).

>In the context of this series, when I say “process” I’m using it more or less in the Erlang sense

I sure wish you wouldn't. See how many times above I had to qualify 'process'? Erlang's reuse of this term makes it difficult to discuss things in terms people already understand, with properties they already understand, like (OS) processes and threads, or the fact that an (OS) process can efficiently share memory among its threads in a way that separate (OS) processes cannot. Erlang has discarded this capability at the language level (but may leverage it as an implementation detail). I'm not sure that will be acceptable to many coming from the efficiencies of, say, Java concurrency, especially for compute-bound tasks. In any case, it's much easier to talk about concur.next if we let terms like process and thread mean what they always have and use new terms for new abstractions.

Note that I have nothing against Erlang or the actor model, but I think it will be a better exploration if one answer (actors) wasn't baked into the question. projectshave is right, it's about models, and then, given a model, how possible/idiomatic/practical/efficient/etc is it in/on a given language or platform.

[link]

From: Lucian Pintilie (Oct 02 2009, at 06:12)

Tim,

I came across tuple spaces a while ago and there are projects which provide implementations in Java. From what I've seen, around 2000-2001 the main names in this area were Sun, with Java Spaces (which was part of the Jini project: http://www.jini.org) and IBM, with T-Spaces (http://www.almaden.ibm.com/cs/tspaces/).

Since then Sun donated the project to Apache Foundation (http://incubator.apache.org/river/RIVER/index.html) and the news on the site are dated more than a year in the past. However, the concept seems to be alive in products (http://www.gigaspaces.com/).

In this context is worth noting the efforts of Dan Creswell to promote Jini: http://www.dancres.org/blitz/

I still believe the concept has wings and can solve elegantly problems in distributed systems. However, there may be a very steep learning curve for many developers working on today's typical Java (web and/or enterprise) application.

[link]

From: Drew (Oct 02 2009, at 06:47)

@Mr. Wobblet:

Multiple people have suggested Tim look at GCD, but I don't think he'll give it weight as even though it's now open-sourced it's Apple. Tim will use their laptops and their consumer software, but he seems to have little interest in their dev tools as evidenced by those awful "share cropper" jokes. Meanwhile, the only way to see an example of GCD today is to use Obj-C in XCode on Snow Leopard.

Having said all that, I watched Bertrand announce GCD and then I found myself experiencing deja-vu when Tim started writing about this subject wondering if anything helped developers neatly handle concurrency.

Oh btw, I agree Apple's mgmt of the AppStore and developer relations in general is crap. I just find the "share cropper" jokes in poor taste at best.

[link]

From: Zooko (Oct 02 2009, at 11:44)

Hello Tim Bray:

Thanks for your contributions to the world's software engineering knowledge base.

Haskell is (almost?) all compiled, rather than "coming with its own virtual machine".

Reading your laundry list makes me think that the E language designers were prescient: http://erights.org .

Some of the ideas from E have gotten into Twisted Python and Foolscap (a Python remote object library).

You might want to try curling up with Mark Miller's "Robust Composition" and tell us what you think of it: http://www.erights.org/talks/thesis .

Regards,

Zooko

[link]

From: Jeremy Bowers (Oct 02 2009, at 17:35)

Performance: As much as I love Python- and Ruby-type languages, if you're going concurrent but starting from a language with 1/10th or 1/20th the performance of C/Haskell/something else, you've got an awful lot of ground to cover before you make up for running the same program in a better language, and the communication complexity only goes up linearly in the best case.

I think the main point of tail calls is that it is how you get from "immutable" to "still usefully changable". Summarizing a few slightly different definitions, "immutable" means that within one call stack, the values won't change. If you want "changed" values, you need a new call stack. TCO means you can have a new call stack without crashing the system. It took me a while to figure this out because I think a lot of functional people got so caught up in the cute-but-ultimately-uninteresting algorithmic tricks it sometimes enables that they forget this actually-fundamental reason for TCO. Immutability without TCO is a lot harder; you'll end up with some sort of faked TCO, or, worse, fake immutability.

[link]

From: Dan Sickles (Oct 02 2009, at 17:45)

Scala actors are implemented in a library. The flexible syntax makes them feel like a language feature.

[link]

From: dantakk (Oct 02 2009, at 19:18)

One approach that doesn't get much hype these days is that taken by Cilk. Wikipedia has more details, but the principle is for the programmer to identify the parallelism and let a scheduler to distribute the workload. I think this has similarities with dataflow in terms of execution but the programming side is quite different.

[link]

From: Michel S. (Oct 02 2009, at 22:59)

No mention of JoCaml? In the context of your blog, it's more known for its speed in the WideFinder contest, but the join-calculus underlying it lends itself really well to distributed computation.

[link]

From: JulesLt (Oct 02 2009, at 23:57)

Two more linked thoughts - one of the other advantages to languages sitting on existing VMs is that they become easier to deploy.

I've had a world of pain trying to get a Python app from internal development to deployment (on Solaris). Python itself may be relatively cross-platform, but many dependent libraries are not - and our infrastructure team have failed to recompile some dependencies on Solaris ('why are you using new things for the sake of it').

Even using a VirtualBox style deployment wouldn't help here, given the gap between development (x86) and deployment (Sparc) - unless we accept a 100% x86 future.

Of course using something like jRuby also means losing those problematic underlying C libraries.

And on the subject of GCD/libdispatch again - I can see why people are excited about it, and especially in combination with OpenCL and blocks - but in the context of this discussion it is probably way too low level, being a C language thing, rather than a new language.

(But it could form a good foundation for implementing concurrency features in higher level languages)

[link]

From: Pete Kirkham (Oct 04 2009, at 03:30)

Tuple spaces are alive and well in distributed used rather than concurrent - I've worked on SCADA and C4I systems which had tuple spaces at their heart; the monitoring and control part of SkyNet and parts of COMBAT (military comms systems) use the tuple space pattern, as do many mesh networking applications.

Reliable distributed systems are a somewhat different issue to gaining performance via concurrency; http://www.dancres.org/blitzblog/ Dan Crezwell might be the guy to ask about using Java tuple spaces for distributing workloads rather than information.

[link]

From: Mauricio Arango (Oct 04 2009, at 20:18)

Tim,

I respectfully disagree with your assessment of Tuple Spaces. First, there are production systems based on Tuple Spaces implementations such as:

http://www.gigaspaces.com/xap

http://www.lindaspaces.com/products/linda.html

http://www.dancres.org/blitz/

A good example is documented the following article by Julian Browne:

http://www.julianbrowne.com/article/viewer/space-based-architecture-example

Second, the basic principles first proposed in Tuple Spaces of space and time decoupling across tasks in a distributed memory parallel program are widely used and expanding. A tuple space is an indirect communication mechanism with implicit synchronization and this is realized with just four language extensions that can be added to any programming language: out(), read(), in() and notify(). Two examples of middleware widely used for distributed parallel programming with space and time decoupling are Message Queues and Publish/Subscribe systems, which functionally are subsets of Tuple Spaces. A message queue can be implemented with Tuple Space's out() and in(); Pub/Sub can be implemented with out() and notify().

Third, there is a rapidly growing interest in Tuple Spaces as signaled by projects and articles such as:

http://www.slideshare.net/luccastera/concurrent-programming-with-ruby-and-tuple-spaces

http://blog.8thlight.com/articles/2008/2/12/rinda-101

http://my.safaribooksonline.com/9780321669919/ch02

http://blog.messagepub.com/2009/06/29/project-spotlight-rinda/

http://www.infoq.com/news/2009/07/tuple-space-blackboard

http://www.semispace.org/semispace/

http://lime.sourceforge.net/Lime/index.html

I believe the approach with parallel programming shouldn't be to find the Java of parallel programming but how to extend existing and future programming languages with simple constructs that enable handling a large set of parallel programming patterns. Tuple Space is still the most attractive option with this approach.

Finally, two recommended books on parallel programming:

“How to Write Parallel Programs, a First Course,” by N. Carriero and D. Gelernter,

“Patterns for Parallel Programming,” by T. Mattson, B.A Sanders, and B.L. Massingill

[link]

From: Greg Pfister (Oct 04 2009, at 23:02)

"You can do massively parallel computing right now today, from FORTRAN forsooth, using MPI. There are more modern approaches including MapReduce and Hadoop."

I presume you're aware that MapReduce/ Hadoop really aren't a "more modern version" of MPI and cannot replace it. The kinds of things done using MPI are generally *nothing* like the kinds of things done with MR/H, and vice versa.

I like MR/H a lot, but wouldn't attempt a particle-in-cell code using them; it would be hideously inefficient compared with MPI. But I would run screaming from trying to use MPI on a "grep the web" kind of operation, while that kind of thing is MR/H's bread and butter.

Why not go back to the original premise here: What are the apps? Which need to go faster (or use less battery power)? You started there, but pretty quickly spun off into languages. Apps first, then worry about how you want to express the algorithms they need.

[link]

From: Anthony Williams (Oct 05 2009, at 00:32)

I currently think that given a sufficiently concurrency-aware language and compiler you can deal with just about everything at the library level.

This is exactly the approach we're taking for concurrency in C++ with C++0x: we've got a concurrency-aware memory model, with atomic operations, threads and locks as a base level, and we're building on that.

This time round we've got futures and promises and (probably) a simple asynchronous function call facility. We plan to extend this with further library facilities in the shape of thread pools and parallel algorithms at a later date.

In my view, the beauty of this is the flexibility it affords. You can have immutable data if you want, you can write functional-style programs if you want, and you can have mutable shared state with locks (or lock-free sync) if you want. You can choose the appropriate approach for your application.

With this approach you can readily implement a message-passing system, a data flow system or an actor system, as well as mutable shared state concurrency. You can even implement transactional memory with a bit of fiddling. Since it's library based, you can pick the level of abstraction exposed to the application writer.

I write about concurrency in C++ on my blog (http://www.justsoftwaresolutions.co.uk/blog/) and in my book (C++ Concurrency in Action --- http://www.manning.com/williams). I have also written an implementation of the C++0x thread library for linux and windows (http://www.stdthread.co.uk).

[link]

From: Shawn Garbett (Oct 06 2009, at 07:31)

You mention strong and weak typing in this article. There is no accepted formal definition of either of these, wikipedia has the following statement: "writers who wish to write unambiguously about type systems often eschew the term "strong typing" in favor of specific expressions such as "static typing" or "type safety." Also, with Haskell one can say that it has a formal type system, as there is a formal mathematical basis for it's typing.

[link]

From: Austin King (Oct 06 2009, at 13:33)

Have you read Simon Peyton Jones' "Beautiful concurrency" essay in the book Beautiful Code (or http://www-personal.umich.edu/~jkglenn/beautiful.pdf)

The Haskell STM library brings back the possibility of doing high level composition of different components in a concurrent environment.

Based on my reading, it sounds like Haskell's Monads and Type System make this tractable to implement... STM in Java has a many fewer features because the language provides less guarantees about the state of the world.

[link]

From: Jeff (Oct 06 2009, at 14:13)

You should look at data parallelism, especially nested data parallelism. NESL was the first language to do nested data parallelism, but Data Parallel Haskell seems to be progressing nicely. This does away with thinking about "threads" at all, and simply gives you a type of an object that is like an array, but evaluates each of its elements in parallel. You can also use list comprehensions for this type to get them to work in parallel. It is very cool, especially for running regular old programs in parallel.

Honestly, I don't really think the notion of writing programs that that can take advantage of multiple processors for performance is all that related to the idea of a language that gives the programmer some abstraction of concurrently interacting "threads". The latter is simply one way to implement the former.

[link]

From: Jack Rusher (Oct 06 2009, at 15:46)

Ingo Lütkebohle — sorry we didn't meet at Trento — is right about the prevalence of the dataflow model in complex event processing. The Aleri platform <http://www.aleri.com/>, of which I was one of the architects, uses a dataflow graph over a thread pool. Streambase <http://www.streambase.com/> and several other major vendors in that space use similar techniques.

Also, Michel S. is right that the join calculus is pretty great. It would probably be worth your time to consider tinkering with JoCaml alongside Haskell, clojure, &c.

[link]

From: Dan Creswell (Oct 07 2009, at 04:06)

Hi Tim,

There's been loads of comments, I doubt you'll get to this one but:

"Like tuple spaces, I don’t see anyone trying to build the concurrent future with dataflow techniques."

I can confirm there are a number of people building tuplespace applications at least as I spend time coaching a bunch of them day to day. I'm not sure exactly what your inner-thinking was here, I expect you certainly weren't making a judgement on usefulness of an approach based solely on what you personally see.

Regardless I think we're still not quite tackling the real question/challenge here. IMHO the question we should be asking is:

What abstractions work best for processing load across many cores?

When we say many cores, do we mean all in one box or across a network of boxes?

Having got that sorted we can then ask ourselves:

What's the best way to provide those abstractions?

It might be language, it might be platform, it might be middleware or some combination.

[link]

author · Dad · software · colophon · rights
picture of the day
September 30, 2009
· Technology (77 fragments)
· · Concurrency (70 more)

By .

I am an employee
of Amazon.com, but
the opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.