ongoing by Tim Bray · Concur.next — Eleven Theses on Clojure

I’ve been banging away on Clojure for a few days now, and while it would obviously take months of study and grinding through a big serious real-world software project to become authoritative, I think that what I’ve learned is useful enough to share.

[This is part of the Concur.next series.]

1. It’s the Best Lisp Ever · I don’t see how this can be a controversial statement. Issues of language-design aside, every other Lisp I’ve worked with has been hobbled by lacklustre libraries and poor integration with the rest of the IT infrastructure. Running on the Java platform makes those problems go away, poof!

Let’s assume hypothetically that there are other Lisps where certain design choices are found to be better than Clojure’s. Well, you can pile all those design choices up on top of each other and the pile will have to be very high before they come close to balancing the value of Java’s huge library repertoire and ease of integration with, well, just about anything.

2. Being a Lisp Is a Handicap · There are a large number of people who find Lisp code hard to read. I’m one of them. I’m fully prepared to admit that this is a shortcoming in myself not Lisp, but I think the shortcoming is widely shared.

Perhaps if I’d learned Lisp before plunging into the procedural mainstream, I wouldn’t have this problem — but it’s not clear the results of MIT’s decades-long experiment in doing so would support that hypothesis.

I think it’s worse than that. In school, we all learn
3 + 4 = 7 and then
sin(π/2) = 1
and then many of us speak languages with infix verbs. So Lisp is fighting uphill.

It also may be the case that there’s something about some human minds that has trouble with thinking about data list-at-a-time rather than item-at-a-time and thus reacts poorly to constructs like

(apply merge-with +
  (pmap count-lines
    (partition-all *batch-size*
      (line-seq (reader filename)))))

[Update:] Rich Hickey provides some alternative and arguably more readable formulations of this code.

I think I really totally understand the value of being homoiconic, and the awesome power of macros, and the notion of the reader. I want to like Lisp; but I think readability is an insanely important characteristic in programming systems.

Practically speaking, this means that it’d be hard for me to go out there on Sun’s (or Oracle’s) behalf and tell them that the way to take the best advantage of modern many-core hardware is to start with S-Expressions before breakfast.

3. Clojure’s Concurrency Features Are Awesome · They do what they say they’re going to do, they require amazingly little ceremony, and, near as I can tell, their design mostly frees you from having to worry about deadlocks and race conditions.

Rich Hickey has planted a flag on high ground, and from here on in I think anyone who wants to make any strong claims about doing concurrency had better explain clearly how their primitives are distinguished from, or better than, Clojure’s.

4. Agents Are Better Than Refs or Atoms · I’m using these terms in a Clojure-specific way: Specifically, I mean agents, refs, and atoms.

Agents are not actors nor are they processes in either the Operating-System or Erlang senses. I’m not actually sure how big a difference that makes; my suspicion is that programmers probably think about using all three in about the same way, and that’s OK.

Anyhow, agents solve concurrency problems in the simplest possible way: By removing concurrency. Send functions to an agent and they’ll get executed one at a time in whatever order, taking the agent variable as their first argument, replacing its value with their output.

Here is an example. I have a map (i.e. hash table) called so-far in which the keys are strings and the values are integers counting how many times each string has been encountered. If I use refs to protect both the hash table and the counters, I get code like this:

 1 (defn new-counter [ so-far target ]
 2   (dosync
 3     (if-let [ c (@so-far target) ]
 4       c
 5       (let [counter (ref 0) ]
 6         (ref-set so-far (assoc @so-far target counter))
 7         counter))))
 8
 9 (defn record [target so-far]
10   (if-let [ counter (@so-far target) ]
11     (incr counter)
12     (incr (new-counter so-far target))))

Let’s start with the record function on Line 9. The if-let looks up the target in the hash, ignoring concurrency issues with @, and uses incr to bump the counter, if there’s one there. If there isn’t, it calls new-counter to make one.

Lines 3 and 4, in new-counter, are where it gets interesting. Since everything’s running concurrently, we can’t just go ahead and bash a new counter into the so-far hash table, because somebody might have come along and done that already, recorded a few values even, so we’re at risk of throwing away data. So after we’ve locked things down with dosync, we check once again to see if the counter is there and if so, just return it. Otherwise we create the new counter, load it into the hash, and return it.

On the other hand, consider the agent-based approach; once again we have a hash table called so-far, but protected by an agent. If the code wants to increment the value for some target, it says
(send so-far add target)

This will eventually call the add function with the hash table (not a reference or anything, the actual table) as the first argument, and target as the second. Here’s add:

(defn add [so-far target]
  (if-let [count (so-far target)]
    (assoc so-far target (inc count))
    (assoc so-far target 1)))

Considerably simpler, and nothing (concurrency-wise) can go wrong.

I do have one nit with agents. Most of my code was infrastructure; a module that reads lines out of a file and passes them one at a time to a user-provided function. At one point, I made some of the code that fixes up the lines that span I/O-block boundaries agent-based, because it was simpler. Unfortunately that code also calls the user-provided function and when one of those also tried to send work off to an agent, everything blew up because you can’t have a send inside a send.

Actually, I think my nit is more general; in an ideal world, concurrency primitives would all be orthogonal and friction-free. But anyhow it’s a nit, not an architectural black hole, I think.

5. Clojure Concurrency Does Buy Real-World Performance · The Wide Finder runs I was using to test were processing 45G of data in a way that turned out to be CPU-limited in Clojure ~~(I think due to inefficiencies in Java’s bytes-on-disk-to-String-objects pipeline, but I’m not sure).~~ So making this run fast on a high-core-count/low-clock-rate processor was actually a pretty useful benchmark.

[Update: Now I’m sure that the bytes-to-strings thing is not the problem; I’m getting much better times, it’s an interesting story and I’ll write it up.]

The single most important result: Clojure’s concurrency tools reduced the elapsed run-time by a factor of four on an eight-core system, with a very moderate amount of easy-to-read (for Lisp) code.

6. Performance is Wonky But It Doesn’t Matter · Some more results:

The amount of extra CPU burned to achieve the 4× speedup was remarkably high, more than doubling the CPU of the whole job.
The costs of concurrency, as functions of whether you use refs, or map/reduce, or agents, and also of block-size and thread-count and so on, are wildly variable and exhibit no obvious pattern.
Well, agents did seem to be quite a bit more expensive than refs. But refs were pretty cheap; a low-concurrency map/reduce approach was not dramatically slower than doing the Simplest Thing That Could Possibly Work with refs.

These results are irrelevant. Remember, this is Clojure 1.0 we’re working with. If we determine that the throughput of the agent handlers is unacceptable, or that the STM-based infrastructure is consuming excessive CPU overhead, I’m quite confident that can be fixed. For example, we could lock Rich Hickey in a basement and put him on a tofu-and-lettuce diet.

7. The Implementation Is Good · I pushed Clojure hard enough to have a couple of subtle code bugs blow out the whole JVM, which takes considerable blowing-out on a Sun T2000. But the bugs were mine not Clojure’s. In the course of quite a few days pounding away at this thing with big data and tons of concurrency, I only observed one bug that I’m pretty sure is in Clojure, and then I couldn’t reproduce it.

Also, I never observed code in Clojure running significantly slower than the equivalent code in Java.

So if I’m wrong and there’s scope for a Lisp to take hold in the mainstream, Clojure would really be a good Lisp to bet on.

8. The Documentation Is OK · The current sources are Stuart Halloway’s Programming Clojure, Mark Volkmann’s Clojure - Functional Programming for the JVM, and of course the online API reference.

I used the book most, and while it’s well-written and accurate, it’s either missing some coverage or a little out of date, as I discovered whenever I published code and helpful commenters pointed out all the newer and better functions that I could have used. I also found the apps they built the tutorial examples around less than compelling.

Also, you can look through the source code, which is mostly in Clojure, and even for someone like me who finds Lisp hard to read, that’s super-helpful. But it’s clear that there’s good scope for a “Camel” or “Pickaxe” style book to come along and grab the high ground.

9. The Community Is Excellent · As I’ve already observed, the Clojure community is terrific; we’ll see how well that stands the test of time. I suspect I may linger around #clojure even when I’ve moved on to other things, just because the company’s good.

10. The Tools Aren’t Bad · I used Enclojure and I recommend it; having it set up and manage my REPL was super-convenient, and it never introduced any bugs or inconsistencies that I spotted. It’s also very early on in its life and there are rough spots, but really it’s good stuff.

I gather that rather more people use Emacs and some favor of SLIME, and I’m sure I would have been just fine with that too.

11. Tail Optimization Is Still a Red Herring · I wrote admiringly in Tail Call Amputation about the virtues of Clojure’s recur and loop forms, as opposed to traditional tail-call optimization. This is clearly a religious issue, and there’s lots of preaching in the comments to that piece. I read them all and I followed pointers, and here’s what I think:

Clojure’s loop/recur delivers 80% of the value of TCO, with greater syntax clarity. Clojure’s trampoline delivers 80% of the remaining 20%.

Near as I can tell, that leaves state-machine implementation as the big outstanding case that you really need TCO for. I’ve done a ton of state-machine work in my career, and while I recognize that you could implement them with a bunch of trampolining tail-called routines, I’ve never understood why that’s better than expressing them in some sort of (usually sparse) array.

So, my opinion is that post-Clojure, this argument is over. I suspect that this will convince exactly zero of the TCO fans, probably including Rich Hickey, and that once again the comments will fill up with people explaining how the real conclusion is that I don’t actually understand TCO. Oh well.

Thanks! · To Rich and the community for welcoming me and helping. I stuffed my code fragments into the SVN repository at the Kenai Divide and Conquer project; they ain’t pretty. If anyone wants to have a whack at the big dataset, send me a hail and if I think you’re serious I’ll get you an account.

The quest for the Java of Concurrency continues.

Contributions

Comment feed for ongoing:

From: Phil (Dec 01 2009, at 16:59)

> Agents Are Better Than Refs or Atoms

This strikes me as kind of a funny thing to say.

If you need real-time consistency, refs are really the only way to go. I guess maybe what you're getting at here is that for most concurrent problems you might think you need the ACI properties that transactions provide, but (at least in batch-process contexts) you can get the job done more simply with agents. Is that the idea?