I’m starting to wind down my Clojure research, but I’m feeling a little guilty about having exposed people to my klunky Lisp-newbie code, perhaps giving a false impression of how the language feels. So I’d like to show you what it looks like when it’s created by someone who’s actually part of the tribe and thinks in it more natively than I probably ever will.

[This is part of the Concur.next series.]

Technomancy · That’s the online handle Phil Hagelberg goes by, and I like it too much to resist a chance to use it. He reacted to my first Wide-Finder-related Clojure fumblings with in which things are mapped, but also reduced, including code which may be perused here.

I think it’d be worth your time to pause for a minute and think about it.

John from Milo · That would be John Evans of Milo, which looks like an interesting site. His first reaction to Phil’s code was this:

(ns my-wide-finder
  "A basic map/reduce approach to the wide finder using agents.
  Optimized for being idiomatic and readable rather than speed.
  NOTE: Originally from:
  http://technomancy.us/130
  but updated to use pmap."
  (:use [clojure.contrib.duck-streams :only [reader]]))

(def re #"GET /(\d+) ")

(defn count-line
  "Increment the relevant entry in the counts map."
  [line]
  (if-let [[_ hit] (re-find re line)]
    {hit 1}
    {}))

(defn my-find-widely
  "Return a map of pages to hit counts in filename."
  [filename]
  (apply merge-with +
         (pmap count-line (line-seq (reader filename)))))

I grabbed that but for some reason couldn’t get it to run against the actual Wide Finder dataset. I pinged John and he provided me with this revised version:

(ns batch-pmap-wide-finder
  "A basic map/reduce approach to the wide finder using agents.
  Optimized for being idiomatic and readable rather than speed.
  Updated to deal with batches of lines instead of individual lines.
  "
  (:use [clojure.contrib.duck-streams :only [reader]]
        [clojure.contrib.seq-utils :only [partition-all]]))

(def *batch-size* 50)

(def re #"GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) ")

(defn tally [line]
  (if-let [[_ hit] (re-find re line)]
    {hit 1}
    {}))

(defn count-lines
  [lines]
  (apply merge-with + (map tally lines)))

(defn find-widely
  "Return a map of pages to hit counts in filename."
  [filename]
  (apply merge-with +
         (pmap count-lines
               (partition-all *batch-size*
                              (line-seq (reader filename))))))

Processing the big dataset, it ran in 1h28m, while burning about 7h25m of CPU. On impulse, I changed his *batch-size* to 100 and this had no effect on the elapsed time but cranked the CPU to just over 8h. Concurrency is weird.

Once again, if you’re not already a Lisper, take a minute to look at and think about this code.

I Look At This Code · And what do I see? First, these guys have internalized the APIs and libraries, and in particular the list- and sequence-processing functions, just like a seasoned Perlmonger or Java-head have internalized those languages’ key APIs. And you’re not really a Clojure programmer until you’ve done that.

It’s remarkable the degree to which you can push all your boring arithmetic and book-keeping down into the guts of declarative/functional calls like partition-all and merge-with. In particular you have to admire the elegance of how John’s tally function flows into merge-with.

The compactness of this code compared to, for example, mine, is remarkable.

Is It Expressive and Readable? · Which is to say, maintainable? Until there are some measurements in a controlled-experiment kind of setting, the answer to that has to be personal and anecdotal. So I’m not going to offer mine right now; I’d like to hear others’ opinions.



Contributions

Comment feed for ongoing:Comments feed

From: Martin Probst (Dec 01 2009, at 03:14)

Maybe I'm just tainted by ALGOL-ish language exposure, but I personally like syntax in a language.

Lisp code has a very regular structure, but the different kind of things you are doing are not easily recognizable from the structure/shape/appearance of the code. Which makes sense, as all you do is applying functions against lists, but to me obscures control flow and behaviour.

[link]

From: Timothy Pratley (Dec 01 2009, at 05:18)

Hi Tim,

Just a couple of suggestions...

(1) do you have any sample input and output? Its not obvious to me what the collecting field is meant to be here.

(2) I suspect using merge-with would be really slow, and unnecessary... how does something like this fair?

(ns pmap-find-widely

(:require [clojure.contrib.duck-streams :as ccds]))

(defn map-count [map key]

(assoc map key (inc (get map key 0))))

(defn find-widely [f re]

(reduce map-count (sorted-map)

(pmap #(second (re-find re %))

(ccds/read-lines f))))

(find-widely "foo.txt" #"GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) ")

;;;; NB: I have no idea whether you really want the second character of a re-find that sounds totally wrong to me.

[link]

From: Martin McCallion (Dec 01 2009, at 05:58)

I'd have to say, it _almost_ looks like a programming language, but...

Nope, nearly thirty years of programming, and I can't make head nor tail of it.

[link]

From: Justin (Dec 01 2009, at 19:59)

Yeah, fellas, I'm a native English speaker of 30 years, but that *French* shit looks like gobbledy-gook to me, amirite?

But on a more sincere note, it's really not all that bad after you've looked at a small to medium amount of Lisp code--you more or less ignore the parens and take your structure queues from the indentation. Which also works out ok from a writing/editing perspective, because you'll be letting your editor handle your close-parentheses accounting (not to mention your indentation) for you.

Please don't let unfamiliarity keep you from learning some new and interesting stuff.

[link]

From: Martin McCallion (Dec 02 2009, at 05:49)

@Justin: To extend your rather offensive metaphor, consider the following.

I've programmed in various languages over the years: BASIC Fortran, Cobol, RPG, Java. Even done a bit of C and C++. If someone posts a segment of Python or Ruby code, for example, I can puzzle my way through it and understand it.

Like an Italian speaker seeing Spanish, say.

But that stuff above? That looks more like Cyrillic, when all you've ever seen is the Roman alphabet.

I don't doubt that you can do great things with it, but sometimes life is just too short, you know?

Anyway, I was attempting to give my answer to Tim's question, "Is it expressive and readable?" Not to me.

[link]

From: Aristotle Pagaltzis (Dec 02 2009, at 07:39)

Martin,

> I was attempting to give my answer to Tim’s question, “Is it expressive and readable?” Not to me.

Justin’s metaphor still applies, I’m afraid. It’s as if someone posted a Thai poem and asked, “isn’t this just beautiful?” and you come in and say, “well I don’t speak Thai so to me it’s not”. The right answer in that position is not “no”; however much it may sting to admit, it is “I have no idea”.

> I don’t doubt that you can do great things with it, but sometimes life is just too short, you know?

Isn’t life’s shortness exactly the reason why you should know more, wildly different languages, rather than just 15 variations of the same flavour? To stick to the familiar language paradigm is to be a carpenter chopping down a tree with a hammer and hacksaw. Life is too short to learn to handle a chainsaw, you know?

Of course, if you want to spend the rest of your life proving Greenspun’s 10th rule over and over, that’s your prerogative. Plenty of people who write code for a living take that route.

[link]

From: Martin McCallion (Dec 02 2009, at 10:52)

Aristotle:

> The right answer in that position is

> not “no”; however much it may sting to

> admit, it is “I have no idea”.

That's a fair point. Though I might argue that, "Not to me," is another formulation of that. Though I accept that it's a less open formulation.

> Isn’t life’s shortness exactly the reason

> why you should know more, wildly different

> languages, rather than just 15

> variations of the same flavour?

There's a case to be made there, and I certainly don't object to learning more languages. Though there's a case to be be made for developing a deep understanding of one or two, for specialisation, too.

My "shortness of life" comment was to do with the time involved in actually learning the language. It all depends, really, on whether you have other things than programming and learning languages to do with your life, I suppose.

> To stick

> to the familiar language paradigm is to be

> a carpenter chopping down a tree with a

> hammer and hacksaw. Life is too short to

> learn to handle a chainsaw, you know?

I can't help but think that, if the difference in effect were that great, the Lisp family would have swept all others away by now.

> Of course, if you want to spend the rest

> of your life proving Greenspun’s 10th rule

> over and over, that’s your prerogative.

Looked it up. Discovered it's an old joke that I've heard before, and that only makes sense if you know Lisp. I don't see how me saying, "This stuff is hard to read" relates to it. How does my use of Java prove that Java "contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp"?

[link]

From: Justin (Dec 02 2009, at 12:04)

Martin, you're right--it was a puerile and offensive way to communicate, and I apologize for it.

You're right that Lisps aren't particularly any better than any other language, I think. But you're still, if I'm interpreting what you're saying right, insisting that Lisp isn't particularly worth learning.

I think it is, though, not because you'll acquire a tool, but because the learning process itself will be edifying and give you new perspectives on programming. Which is the reason anyone reads CS blogs in the first place, right?

I assure you, it's not anywhere as difficult as learning Cyrillic, no matter how daunting it seems at first.

[link]

From: Martin McCallion (Dec 02 2009, at 13:36)

> But you're still, if I'm interpreting what

> you're saying right, insisting that Lisp

> isn't particularly worth learning.

I wouldn't go that far. Though personally, if I had the time to learn another language at the moment, I'd choose Python or maybe Ruby, as they are the ones that are getting all the visibility at the moment.

> I think it is, though, not because you'll

> acquire a tool, but because the learning

> process itself will be edifying and give

> you new perspectives on programming.

Learning for its own sake is certainly a good thing.

> Which is the reason anyone reads CS blogs

> in the first place, right?

Yes, although here at Tim's blog there's a lot more than CS. I think I probably came for the XML originally, but stayed for the photography and everything else.

[link]

From: Peter Eddy (Dec 02 2009, at 14:35)

> How does my use of Java prove that Java "contains an ad hoc,

> informally-specified, bug-ridden, slow implementation of half

> of Common Lisp"?

Martin, if you really want to know you can answer that for yourself by learning Lisp. I actually did learn lisp specifically because I wanted to understand this law. That may not sound convincing, but as a reformed Java/C developer and Lisp convert, I've come to understand and appreciate it very well. Unfortunately it can't be explained quickly, or at least I can't do it.

There are many factors to Lisp that in combination produce such a wonderful tool, for example the functional nature, list or in Clojure's case sequence orientation, and macros (which are not the same as macros in other languages, and which demand the particular Lisp language syntax). Of course Clojure adds great concurrency support to this. All I can say are the following: 1) Lisp is at least as reorienting as structured and OO programming was, and 2) It's a way of programming where the emphasis is on telling the computer what to do rather than how to do it.

Imagine trying to explain OO to someone, convincingly, in a paragraph and that might illustrate my inability to do the same for Lisp.

[link]

From: Aristotle Pagaltzis (Dec 03 2009, at 04:55)

Martin:

> It all depends, really, on whether you have other things than programming and learning languages to do with your life, I suppose.

It does. I meant the “your prerogative” comment honestly, not sarcastically. I know it doesn’t read that way, because I can’t escape my biases as a programmer, but I do appreciate that not everyone makes the same choices for their life as a whole as I do.

> I can’t help but think that, if the difference in effect were that great, the Lisp family would have swept all others away by now.

My opinion on the lacklustre success of Lisp is that while the benefit is real, you cannot acquire it in small incremental steps. You cannot avoid the initial productivity trough as you learn to work with it on its own terms (much like you don’t acquire fluency in a new natural language in small incremental steps, but have to start out being able to say almost nothing of any substance). That’s a big barrier, no matter how big the rewards.

> How does my use of Java prove that Java “contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp”?

It’s not Java that contains such a thing, but the programs you are writing in Java. Although the fact that Java has garbage collection built in means that the principle is far less true for code in Java than in C – thank goodness.

However.

I didn’t mean it literally, but more as a general principle. The general principle behind it is that if you work in an environment that offers low abstraction and only low-level primitives (whether that be missing support for closures or a lack of abstract concurrency primitives), then a lot of the code you end up writing as application logic is not application logic so much as partial versions of these higher-level primitives. So any sufficiently complicated program in a lower-level program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of a higher-level language.

This is inevitable, because whether or not your language has eg. high-level concurrency primitives, if you write concurrent code you are going to run into the problems that any sort of concurrent code has, so you are going to write solutions for the same problems as everyone else does. And so the resulting code will necessarily be equivalent in the abstract with the code everyone else is writing.

And this is the general principle behind Greenspun’s 10th rule.

(Note that Lisp, or more generally functional programming, is not the only kind of paradigm where that applies. Declarative programming à la Prolog is, lamentably, neglected more than ever.)

And this brings me to the final point about learning profoundly different languages: doing so helps you recognise the shadow manifestations of these higher-level paradigms in code written using a different paradigm. And when you realise that you are trying to implement the concurrency primitives from our previous example, then you don’t even have to switch languages – you suddenly gain a powerful new way of thinking systematically about what it is you were trying to do even back in Java, say.

Note that the gains may not manifest as a productivity increase as such. They may instead lead you to be better able to debug problems in a certain domain, or let you write inherently less buggy code because you are thinking of the domain in well-informed terms. It can be very hard to quantify the benefits objectively, however real they are. It’s very much like the quality without a name.

[link]

From: Aristotle Pagaltzis (Dec 03 2009, at 04:58)

Justin:

> Martin, you’re right—it was a puerile and offensive way to communicate, and I apologize for it.

I was on my way to make the same point before I saw you had already made it. Only your delivery was objectionable; the argument was sound.

[link]

author · Dad · software · colophon · rights
picture of the day
November 30, 2009
· Technology (77 fragments)
· · Concurrency (70 more)

By .

I am an employee
of Amazon.com, but
the opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.