<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.1//EN' 'http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd'>
<html xmlns:og='https://ogp.me/ns#' lang='en'>
<head>
<title>ongoing by Tim Bray &#xb7; Concur.next &amp; WF2 â€” Tuning Concurrent Clojure</title>
<meta name='viewport' content='width=device-width, initial-scale=1.0, shrink-to-fit=no'/>
<meta property='og:site_name' content='ongoing by Tim Bray'/>
<meta property='og:title' content='Concur.next &amp; WF2 â€” Tuning Concurrent Clojure'/>
<meta property='og:image' content='/ongoing/misc/podcast-default.jpg'/>
<meta property='og:type' content='website'/>
<meta http-equiv='Content-Type' content='text/html; charset=UTF-8'/>
<link rel='stylesheet' type='text/css' media='screen' title='serif' href='/ongoing/serif.css' />
<script type='text/javascript' src='//use.typekit.net/ugm7uwx.js'></script>
<script type='text/javascript'>try{Typekit.load();}catch(e){}</script>
<script type='text/javascript' src='/ongoing/ongoing.js'></script>
<link rel='alternate' type='application/atom+xml' title='Atom (full content)' href='/ongoing/ongoing.atom' />
<!-- Generated from XML source code using Perl, Expat, Emacs, Mysql, Ruby, Java, and ImageMagick.  Industrial-strength technology, baby. -->
</head><body itemscope='' itemtype='http://schema.org/Blog'>
<div id='payload'>
<div id='banner'><h1 itemprop='name'>Concur.next &amp; WF2 â€” Tuning Concurrent Clojure</h1><div id='search'><form action="https://www.google.com/search" target="_parent">Search <input size="20" name="as_q" /><input type="hidden" name="hl" value="en" /><input type="hidden" name="ie" value="UTF-8" /><input type="hidden" name="btnG" value="Google+Search" /><input type="hidden" name="as_qdr" value="all" /><input type="hidden" name="as_occt" value="any" /><input type="hidden" name="as_dt" value="i" /><input type="hidden" name="as_sitesearch" value="tbray.org" /></form></div></div>
<div id='center-and-right'><div id='centercontent'>
<p itemprop='description'>Iâ€™ve been working on, and writing about, running Clojure Wide Finder code.
But I was never satisfied with the absolute performance numbers.  This is a
write-up in some detail as to how I made the code faster and also slower,
including lessons that might be useful to those working on Clojure
specifically,  concurrency more generally, and with some interesting data on
Java garbage collection and JDK7.</p>

<p><i>[Update: My relationship with the JVM is improving, based on good
advice from the Net.  This article has been partly re-written and is much more
cheerful.]</i></p>

<p>[This is part of the
<a href='/ongoing/When/200x/2009/09/27/Concur-dot-next'>Concur.next
series</a> and also the
<a href='/ongoing/When/200x/2008/05/01/Wide-Finder-2'>Wide Finder 2</a>
series.]</p>

<p>In
<a href='http://www.tbray.org/ongoing/When/200x/2009/12/01/Clojure-Theses#c1259727988.386430'>a
comment</a> on my last piece,
<a href='http://www.khrabrov.net/'>Alexy Khrabrov</a> noted â€œI saw WF2 has a
pretty good Scala result, apparently better than Clojure'sÂ â€”Â and that's a
year-old Scala.â€ Alexy is right; my best times were mostly over 30 minutes,
while
<a href='http://blog.waldin.net/2008/06/diminishing-returns.html'>Ray Waldinâ€™s
Scala code</a> ran in under 15.  Thereâ€™s no obvious reason I can see why
Clojure should be significantly slower than Scala, and while I stand by my
<a href='/ongoing/When/200x/2009/12/01/Clojure-Theses'>Eleven Theses on
Clojure</a>, I was puzzled and irritated.</p>

<p>The following are organized roughly chronologically; if the narrative isnâ€™t
all that coherent, that would be an indicator of me having done considerable
thrashing about.</p>

<p id='p-1' class='p1'><span class='h2'>Low-Hanging Fruit</span> &#xb7; 
The information you can get from the JVM with the
<code>-Xprof</code> argument is not brilliant, but itâ€™s a whole lot better
than nothing.  What you want to see in profiling output is a big fat obvious
culprit, and I did, and it was the code fragment below, which breaks up
buffers into lines; <code>text</code> is the text weâ€™re wrangling,
<code>first-nl</code> and <code>last-nl</code> are the first and last places
where a newline character(whose value, note, is 
10) appears in the block.  <code>destination</code> is the user-provided
payload function.</p>

<pre><code>(loop [ nl first-nl ]
  (let [ from (inc nl) to (. text indexOf 10 from) ]
    (if (&lt; from last-nl)
      (do
        (destination (new String (. text substring from to)) accum user-data)
        (recur to)))))))</code></pre>
<p>This sucker was burning well over half of all the CPU.  There are a few
things wrong with it.  First of all, since I was already using Java to split
up the text, why not do it all at once like this?</p>

<pre><code>(let [chunks (. #"\n" split text)</code></pre>
<p>Yep, that helped.  (Iâ€™d actually had it that way in a previous iteration
but had unrolled it while fighting a memory leak that turned out to be
unrelated).</p>

<p><em>Lesson: Any time you can package up a bunch of work as a single
call into Javaland, thatâ€™s probably a good idea.</em></p>

<p id='p-6' class='p1'><span class='h2'>Reducing</span> &#xb7; 
The second thing that was wrong was that I was iterating at all.  Lisps
want you to think in terms of lists not their members, and I wasnâ€™t. 
What I was actually wanted to do was to call the payload function on each line
of text, passing the output of each invocation to the input of the next.  And
of course Clojure, like any decent functional language, has a nice
<code>reduce</code> function.  So letâ€™s just add a line to that fragment:</p>

<pre><code>(let [chunks (. #"\n" split text)
      ; ... turn "chunks" into "lines" by stripping leading/trailing fragments ...
      accumulator (reduce per-line nil lines)</code></pre>
<p>Lispers are now nodding their heads in a despairing
of-course-the-moron-should-have-done-it-that-way way.  Me, I like it when a profiler
shows me where Iâ€™m being stupid.  Some non-Lispers are probably thinking
â€œThatâ€™s slickâ€.  Most modern languages, not just Lisps, have some sort of a
<code>reduce</code> equivalent.</p>

<p>Iâ€™m a bit amused here: this code has (basically) a map/reduce architecture,
except for Iâ€™m using <code>reduce</code> in the map phase.</p>

<p><em>Lesson: Operate on lists instead of iterating.</em></p>

<p id='p-12' class='p1'><span class='h2'>Type Hinting</span> &#xb7; 
The code was now running visibly faster; time for another whack at the
profiler.  I saw a whole lot of time being sucked up by
<code>java.lang.Class.forName0</code>.  This suggested to me that the Clojure
runtime was spending a bunch of time trying to figure out
what type of thing was crossing some method-dispatch barrier.  I already had a
wrapper around the user-provided payload function because it has three
arguments and <code>reduce</code> wants just two, and to protect myself from
<a href='http://bugs.sun.com/bugdatabase/view_bug.do;jsessionid=99227e05664f737f96932f3ec3c0?bug_id=4513622'>Java
issue 4513622</a>.
I seemed to remember from
somewhere that Clojure has â€œtype hintsâ€, so I looked that up and decorated the
wrapper:</p>

<pre><code>(let per-line (fn [accum #^String line] (destination (new String line) user-data accum))</code></pre>
<p>See that <code>#^String</code> goober?  Itâ€™s a hint telling Clojure
to assume that the incoming <code>line</code> argument is a Java string.  
It helped too, quite a bit.  I was sort of irritated at having to hold
the systemâ€™s hand like this.  But I think this may be a Release-1.0 symptom
rather than a Clojure design flaw.  It feels to me like thereâ€™s quite a bit
more scope for type inferencing, maybe even JIT inferencing at run-time.</p>

<p><em>Lesson: A little static typing can go a long way.</em></p>

<p id='p-5' class='p1'><span class='h2'>Reading vs. Mapping</span> &#xb7; 
My first cut at this problem used Javaâ€™s NIO subsystem; Iâ€™d noted that
<code>java.nio.channels.FileChannel</code> advertises itself as being
thread-safe and furthermore had a <code>map()</code> method for mapping stuff
into memory.  So I had this pretty slick (I thought) setup where multiple
threads would just get dealt out an offset and block size and map successive
regions of the file, then do the string-i-fying and splitting.</p>

<p>I was twiddling buffer sizes and thread counts and not observing any real
helpful patterns and wondered if I was maybe overthinking the problem.  So I
made another version that uses a Clojure <code>agent</code> to synchronize
sucking in the data with good old-fashioned
<code>java.io.FileInputStream.read()</code> and fires off a new thread for
each block.  Right off the bat, it was faster.</p>

<p>Experimenting with this one seemed to suggest that it ran faster and faster
the smaller I made the buffers.  Iâ€™d started with 64M and eventually went all
the way down to 2M; 1M wasnâ€™t any faster.  I hypothesize that with the big
buffers, the big <code>String.split</code> operation illustrated above was
creating too many short-lived transient objects that were overloading the
garbage collector.</p>

<p><em>Lesson: Disk really is the new tape.</em></p>

<p id='p-10' class='p1'><span class='h2'>Stabilizing the JVM</span> &#xb7; 
Early on, when I was fighting a memory leak, Rich Hickey suggested using a
smaller heap size.  I did that, and was astounded.</p>

<p>I was now seeing throughput on the order of 50-60M/sec.  But as I watched
the program run, I could see that as the program built up its heap to the
ceiling (set via javaâ€™s <code>-Xmx</code> argument), it was running much
slower, as slow as a tenth of that speed.  When it hit the ceiling, absent
other pathologies, it would ramp up to production speed, I assume as the GC
got into balance with the rest of the programâ€™s work.</p>

<p>Since it can easily take minutes for the JVM to expand to fill
a 10G-or-bigger heap, you really can shave a couple of minutes off your run
time by giving it a whole lot less. Yes, this is profoundly
counter-intuitive.</p>

<p><em>Lesson: Measure, donâ€™t guess.</em></p>

<p id='p-13' class='p1'><span class='h2'>Happy Dance!</span> &#xb7; 
At this point, I was starting to see run-times down well below fifteen
minutes, and profiler output that looked like this (the primitive -Xprof
facility gives you a little dump every time a thread exits, which turns out to
work really well for me as Iâ€™m running through hundreds of â€™em):</p>

<pre><code>Flat profile of 29.63 secs (259 total ticks): Thread-58

  Interpreted + native   Method
  0.4%     1  +     0    java.util.Arrays.copyOf
  0.4%     1  +     0    sun.nio.cs.UTF_8.updatePositions
  0.4%     1  +     0    java.lang.StringCoding$StringDecoder.decode
  1.2%     3  +     0    Total interpreted

     Compiled + native   Method
 57.1%   148  +     0    java.util.regex.Matcher.search
 17.0%     1  +    43    java.util.regex.Pattern.split
  8.1%     0  +    21    clojure.lang.APersistentVector$Seq.reduce
  5.8%     3  +    12    java.util.regex.Pattern.matcher
  3.1%     0  +     8    clojure.lang.ArraySeq.next
  2.7%     0  +     7    clojure.core$conj__3121.invoke
  1.2%     1  +     2    clojure.lang.PersistentHashMap$FullNode.assoc
  0.8%     2  +     0    clojure.core$next__3117.invoke
  0.4%     0  +     1    clojure.core$re_find__4554.invoke
  0.4%     1  +     0    org.tbray.paralines$record__54.invoke
  0.4%     0  +     1    clojure.core$assoc__3148.invoke
  0.4%     0  +     1    clojure.lang.PersistentHashMap$BitmapIndexedNode.assoc
 97.3%   156  +    96    Total compiled

         Stub + native   Method
  0.8%     0  +     2    java.lang.System.arraycopy
  0.8%     0  +     2    java.lang.Object.getClass
  1.5%     0  +     4    Total stub</code></pre>
<p>In other words, the program is spending most of its time using Javaâ€™s regex
  machinery to grind away at all that text input; which is I think what youâ€™d
  like to see.</p>

<p>This was around 10PM last Friday, and I was
<a href='http://twitter.com/timbray/status/6360087789'>crowing exultantly</a>
  on   Twitter, feeling like a rockinâ€™ sockinâ€™
  functional-programmingâ€™ concurrinâ€™ homoiconic wizard.  â€œFaster than Scalaâ€ I
  uttered, incautiously.</p>

<p><em>Lesson: Read some Classics in transition from the archaic Greek and
  learn to fear the punishment for hubris, especially public hubris.</em></p>

<p id='p-9' class='p1'><span class='h2'>Which Wide Finder Was That?</span> &#xb7; 
Because, you see, I was actually running a
<a href='/ongoing/When/200x/2007/09/20/Wide-Finder'>Wide Finder 1</a> program,
  just computing one little statistic, whereas Ray Waldinâ€™s Scala code, and
  the others on the
  <a href='http://wikis.sun.com/display/WideFinder/Results'>results page</a>,
  were working on
  <a href='http://wikis.sun.com/display/WideFinder/The+Benchmark'>the
  benchmark</a> from
  <a href='/ongoing/When/200x/2008/05/01/Wide-Finder-2'>Wide Finder 2</a>.
Itâ€™s a lot more complex and computes five different statistics which
require looking at a much higher proportion of the data.</p>

<p>Well, I thought, How hard can it be?</p>

<p><em>Lesson: </em> [Never mind. Weâ€™ve all been there.]</p>

<p id='p-7' class='p1'><span class='h2'>Attitude Problem</span> &#xb7;  
I now have one, because it seemed like a lot more work than it should be to
  code this up.</p>

<p>The stats you want to compute could be called â€œhitsâ€, â€œbytesâ€, â€œreferersâ€,
  â€œ404sâ€, and â€œfetchersâ€.  So I accumulated results in a map whose keys were 
<code>:hits</code>, <code>:bytes</code>, <code>:referers</code>,
  <code>:404s</code>, and <code>:fetchers</code>; the values were maps whose
  keys were the URIs and values the in-progress statistics.  So
  if you want to add increment the <code>:hits</code> value for
  some URI, you say:</p>

<pre><code>(bump accumulator :hits uri 1)</code></pre>
<p>Hereâ€™s the <code>bump</code> function:</p>

<pre><code>(defn bump [accum stat uri increment]
  (let [totals (accum stat)
        currently (if-let [total (totals uri)] total 0)]
    (assoc accum stat (assoc totals uri (+ currently increment)))))</code></pre>
<p>In Ruby itâ€™d all be a one-liner...</p>

<pre><code>accumulator[stat][uri] += increment</code></pre>
<p>...but then Iâ€™d be promiscuously mutating state and condemning
  myself to single-threadedness.  Which is what this series is all about
  trying to avoid.
On the other hand, it says <em>exactly</em> what I want to say without any
  ceremony or extra arm-waving.  Is there a middle way?</p>

<p>Anyhow, hereâ€™s my gripe: I started sketching in the code to gather the five
  Wide Finder 2 stats Sunday while I was watching football.  I spent all day
  Monday, until late in the evening, getting it all to work; admittedly,
  Iâ€™m a relative Clojure newb, but 
  Iâ€™m not <em>that</em> terrible a programmer and I think Iâ€™m actually fairly
  quick at learning new things.  This is way too long.</p>

<p>The result-merging step in particular gave me heartburn.  If you can merge
  two maps with Clojureâ€™s elegant <code>merge-with</code> and you can turn a
  list-or-vector into another list-or-vector with <code>map</code>, why canâ€™t
  you run a <code>map</code> on a <code>map</code> and get a
  <code>map</code>?!?</p>

<p>Now, the situation wasnâ€™t helped by the fact that Clojure is after all 1.0,
  so its tracebacks are not models of helpful transparency; also, the code is
  running in dozens of threads in parallel which made for extra work in
  tracking down where what provoked that NullPointerException or
  ClassCastFailure.</p>

<p><em>Lesson: Some combination of faults in Clojure and faults in
  your humble correspondent is interacting badly.</em></p>

<p id='p-15' class='p1'><span class='h2'>Performance</span> &#xb7; 
Improved, but not terribly satisfying.
The wrong collection of JVM options can drive this sucker
  into Garbage-Collection hell. This is painfully obvious when itâ€™s going on;
  suppose youâ€™re using a simple monitoring tool like <code>top(1)</code>
while you monitor the output, you
  can see it trundling along processing 30 to 50 MB/sec and reporting a CPU
  burn rate anywhere between 1200% and 3000%, which means youâ€™re keeping all
  the cores and threads pretty darn busy.  Then all of a sudden the CPU drops
  to around 100% and the output just stops.  For like 90 seconds or more.  If
  youâ€™ve picked a bad combination of options, this happens more and more
  frequently until youâ€™re spending way more time garbage-collecting than
  actually computing.</p>

<p><i>[Update:]</i> Someone who wishes to remain anonymous read the first
  version of this piece had a discussion with me about JVM options and
  suggested simply taking all of them out, except for -Xms and -Xmx to ensure
  thereâ€™s enough heap; letting it manage its own GC strategy.</p>

<p>This person turned out to be smarter than the tribal lore Iâ€™d picked up
  over the years and around the net, and Iâ€™ve managed to stay out of GC Hell
  for the last couple of days.</p>

<p>On the other hand, after all the profiling and adjusting, my code
  has grown tons of extrusions and decorations; hereâ€™s the bit that actually
  does the work of processing a line of logfile:</p>

<pre><code> 1 (defn proc-line [line so-far accum-1]
 2   (if (nil? line)
 3     (send so-far merger accum-1)
 4     (let [accum-2 (if accum-1 accum-1 (new-accum))
 5           fields (. #" " split line)
 6           ; [client _ _ _ _ _ uri _ bstring status ref] (. #" " split line)
 7           #^String uri-1 (aget fields 6) #^String uri (. uri-1 intern)
 8           #^String status (aget fields 8)
 9           accum-3 (if (= status "404") (bump accum-2 :404s uri 1) accum-2)
10           #^String bstring (aget fields 9)
11           accum-4 (if (re-find #"^\d+$" bstring)
12                     (bump accum-3 :bytes uri (new Integer bstring))
13                     accum-3)]
14       (if (re-find re uri)
15         (let [accum-5 (bump accum-4 :hits uri 1)
16               #^String ref (aget fields 10)
17               accum-6 (if (or (= ref "\"-\"") (re-find #"tbray.org/" ref))
18                         accum-5 (bump accum-5 :referers ref 1))
19               #^String client (aget fields 0)
20               accum-7 (bump accum-6 :fetchers client 1) ]
21           accum-7)
22         accum-4))))</code></pre>
<p>Boy, that ainâ€™t pretty.  At one point, I was pulling out the fields with
  fragments like <code>(getÂ fieldsÂ 6)</code> but that led to some
  XxxArrayAccessor method bubbling to the top of the profile output.
I attempted to replace that with â€œdestructuringâ€, as in the commented-out Line
  6, but that turns out to be just a macro that apparently generates about the
  same calls.</p>

<p>I settled on <code>aget</code>, apparently specialized for fishing around
  in Java arrays.  But you still have to type-hint what comes out of it.
  Feaugh.</p>

<p>In Line 7, you can protect yourself from our old friend 4513622 by either
  calling <code>newÂ String</code> on the uri, or by interning it, as above.
  The former causes more memory stress but, if you can spare the memory, runs
  faster.</p>

<p><em>Lesson 1: Iâ€™m not sure thereâ€™s a clear one here, but it makes me nervous
  when application code that should be simple is hard to get right, and so far
  Clojureâ€™s not making this developer happy the way Matz designed Ruby
  to.</em></p>

<p><em>Lesson 2: The Concurrency Tax is too high.</em></p>

<p><em>Fallback hypothesis: Iâ€™m just Doing It Wrong.</em></p>

<p id='p-17' class='p1'><span class='h2'>GC Explosion</span> &#xb7; 
I accidentally ran an experiment that showed me where a lot of my GC pain
  was coming from.  Reminder: this works by having one Clojure agent read
  blocks sequentially and handing them off to threads to process, and build
  those tables of tables with the results.  When each thread is done, it sends
  its table off to another agent to merge.</p>

<p>One time, I had a bug that turned the merger into a no-op, the agent was
  invoked all right but simply ignored the incoming table.  This version ran
  very fast; not as fast as the simple Wide Finder 1 code Iâ€™d been working
  with earlier, but not bad at all, with a lot less GC overhead.
So that code above may be ugly but it actually runs OK.</p>

<p>Hereâ€™s the code that does the merge:</p>

<pre><code>(defn merger [current incoming]
  (loop [keys (keys current) output {}]
    (if-let [key (first keys)]
      (let [merge-1 (merge-with + (current key) (incoming key))]
        (recur (rest keys) (assoc output key merge-1)))
      output)))</code></pre>
<p>Itâ€™s iterating all right, but only over the keys of the top-level map, of
  which, remember, 
  there are only 5: <code>:hits</code>, <code>:bytes</code>, and so on.  Not
  much to it, and reasonably idiomatic I think.  But boy does it ever generate
  garbgae.</p>

<p>I note that in Ray Waldinâ€™s
  nicely-performant Wide Finder 2 code he relied on a
  <code>java.util.concurrent.ConcurrentHashMap</code>.   I suppose itâ€™s
  unsurprising that that would outperform the sort of pure functional/dynamic
  code above, but it feels a bit like cheating.</p>

<p id='p-16' class='p1'><span class='h2'>Java Versions and Options</span> &#xb7; 
My best times for the full WF 2 run are now
  around 30 minutes.  29:38 with JDK7, 35:10 with Java 6.
The results of tinkering with GC options and generation
  sizes and so on seem insignificant-to-damaging.  In particular then new â€œG1â€
  garbage collector in JDK7 doesnâ€™t seem appropriate for this problem.</p>

<p id='p-18' class='p1'><span class='h2'>What I Could Do Next</span> &#xb7; 
</p>
<ul>
<li><p>Use a ConcurrentHashMap like Ray did.</p>
</li>
<li><p>Bring
<a href='https://visualvm.dev.java.net/'>VisualVM</a> to bear on the problem
  and really do a deep-dive on whatâ€™s happening inside this code.</p>
</li>
<li><p>Try refactoring the app some more.</p>
</li>
</ul>
<p>I dunno.  I still like Clojure and stand by my 11 Theses, but that
  impedence mismatch with my conventional procedural object-oriented
  programerâ€™s mind grows fatiguing.</p>

<hr />
<div id='commentHere'></div>
<div id='footer'><p class='footer'><b>Updated: 2009/12/10</b></p>
</div>
<hr/>
<h2 id='comments'>Contributions</h2>
<div class="comments"><p>Comment feed for <span class="o">ongoing</span>:<a href="/ongoing/comments.atom"><img src="/ongoing/Feed.png" alt="Comments feed"/></a></p><div class='comment-a' id='c1260410341.791908'><p class='from'>From: <a href='http://erikengbrecht.blogspot.com'>Erik Engbrecht</a> (Dec 09 2009, at 21:09)</p><p>One of the neat things Ray's solution does (and, indeed, I think most of the top performers) is not bother with line splitting.  Once you're dealing with fairly optimized code, the performance is really driven by how many times you have to pass over the data.  Java is automatically penalized in this case because it does one pass in order to turn bytes into strings, where languages that natively work with ASCII can just copy the bytes in.  Being able to use the bytes directly rather than having a big heavy immutable string object probably helps a lot too (there's a another pass - copying the file buffer memory into the string buffer memory).</p><p>Anyway, I suggest looking through your code and seeing how many times you'll end up scanning the bytes/chars during your processing.  Squishing that down will squish your time.</p>
<p><i>[<a href='#c1260410341.791908'>link</a>]</i></p></div><div class='comment-b' id='c1260411136.375797'><p class='from'>From: <a href='http://'>katre</a> (Dec 09 2009, at 21:09)</p><p>I don't know from lisp, but if Clojure runs on the JVM does it make any sense to be using a dedicated Stats class instead of this map or maps of maps thing?</p><p>I'd probably invert your model and do a map of URI to Stats, where Stats then has five members for count, referers, etc.</p><p>I am hugely interested in all of these Clojure articles, but all they've convinced me of is "If I want to go functional, use more Scala".</p>
<p><i>[<a href='#c1260411136.375797'>link</a>]</i></p></div><div class='comment-a' id='c1260413469.481034'><p class='from'>From: <a href='http://'>Chouser</a> (Dec 09 2009, at 21:09)</p><p>Using java.util.concurrent.ConcurrentHashMap is *not* cheating.  There are many problems that can't be solved with a ConcurrentHashMap, and the more powerful constructs of refs, maps, transactions, etc. are required.  But if they're not, by all means use the more efficient features of java.util.concurrent.</p><p>Also, here's an alternate implementation of bump.  I don't know if its any faster or produces less memory churn, but at least it's a bit simpler:</p><p>(defn bump2 [accum stat uri increment] </p><p>  (update-in accum [stat uri] #(+ (or % 0) increment)))</p><p>--Chouser</p>
<p><i>[<a href='#c1260413469.481034'>link</a>]</i></p></div><div class='comment-b' id='c1260413505.245939'><p class='from'>From: <a href='http://'>Nick L</a> (Dec 09 2009, at 21:09)</p><p>Won't that intern cause a memory leak? (Assuming it works the same way as the java intern does)</p>
<p><i>[<a href='#c1260413505.245939'>link</a>]</i></p></div><div class='comment-a' id='c1260418580.314919'><p class='from'>From: Gary W. Johnson (Dec 09 2009, at 21:09)</p><p>Here are some substantially more idiomatic clojure representations of your functions.  I'm not sure how they compare performance-wise, but they're certainly nicer to read IMHO.</p>
<pre><code>(defn bump [accum stat uri increment]
  (if (zero? increment)
    accum
    (update-in accum [stat uri] #(+ increment (or % 0)))))
(defn merger [current incoming]
  (merge-with (partial merge-with +) current incoming))
(defn proc-line [line so-far accum-1]
  (if (nil? line)
    (send so-far merger accum-1)
    (let [fields (.split line " ")
	  #^String client  (aget fields 0)
	  #^String uri-1   (aget fields 6)
	  #^String status  (aget fields 8)
	  #^String bstring (aget fields 9)
	  #^String ref     (aget fields 10)
	  #^String uri     (.intern uri-1)
	  accum-4 (-&gt; (or accum-1 (new-accum))
		      (bump :404s  uri (if (= "404" status) 1 0))
		      (bump :bytes uri (if (re-find #"^\d+#" bstring) (Integer. bstring) 0)))]
      (if (re-find re uri)
	(-&gt; accum-4
	    (bump :hits     uri    1)
	    (bump :referers ref    (if (or (= ref "\"-\"") (re-find #"tbray.org/" ref)) 0 1))
	    (bump :fetchers client 1))
	accum-4))))</code></pre>
<p><i>[<a href='#c1260418580.314919'>link</a>]</i></p></div><div class='comment-b' id='c1260421604.365385'><p class='from'>From: <a href='http://www.tbray.org/ongoing/'>Tim Bray</a> (Dec 09 2009, at 21:11)</p><p>Testing, testing. Fixing breakage in the comment system.</p>
<p><i>[<a href='#c1260421604.365385'>link</a>]</i></p></div><div class='comment-a' id='c1260427208.13774'><p class='from'>From: <a href='http://www.tbray.org/ongoing/'>Tim Bray</a> (Dec 09 2009, at 22:40)</p><p>Katre: So thousands of little 5-element hashes would be better than 5 big hashes? Not obvious either way.</p><p>Nick: Could you expand? I would expect the opposite, and in fact the memory growth was more controlled.</p>
<p><i>[<a href='#c1260427208.13774'>link</a>]</i></p></div><div class='comment-b' id='c1260429159.794377'><p class='from'>From: <a href='http://'>David Nolen</a> (Dec 09 2009, at 23:12)</p><p>Perhaps you should post the entirety of your code now that you've banged on it a little while :) I'm sure the Clojure community would be able to give you not only optimizations but perhaps other ways of tackling the problem.</p><p>One issue that you're probably running into is in the creation of many ephemeral objects. I wonder if this could be avoided by using atoms to hold the counts for bumping and not using nested maps?</p><p>While Clojure certainly encourages a functional style, Rich has introduced enough constructs where you can code in a concurrency safe yet some what imperative style if that is more natural. </p>
<p><i>[<a href='#c1260429159.794377'>link</a>]</i></p></div><div class='comment-a' id='c1260430058.939814'><p class='from'>From: <a href='http://'>Simon Eubanks</a> (Dec 09 2009, at 23:27)</p><p>I'd like to tackle this problem, but I don't want to reverse engineer the problem definition from the code. </p><p>Is a definition of the Widefinder problem available?  Sample data with expected results will also be nice to have.</p>
<p><i>[<a href='#c1260430058.939814'>link</a>]</i></p></div><div class='comment-b' id='c1260434967.606756'><p class='from'>From: <a href='http://'>Miron Brezuleanu</a> (Dec 10 2009, at 00:49)</p><p>About the 'Attitude Problem' section gripe (assoc/assoc): Maybe assoc-in helps getting closer to the desired Ruby syntax?.</p>
<p><i>[<a href='#c1260434967.606756'>link</a>]</i></p></div><div class='comment-a' id='c1260435064.812050'><p class='from'>From: <a href='http://'>Patrick Wright</a> (Dec 10 2009, at 00:51)</p><p>I think it would be better for your readers and the Clojure community if you posted your code somewhere, e.g. GitHub, for them to take a look at and study. Posting benchmarks with incomplete code blocks makes it hard for us to analyze where the problems might be, and your blogged results will remain in the blogosphere for eons, giving people the impression that perhaps Clojure is fundamentally broken or hard to work with and tune. I understand if you feel your code is a work in progress, but hey, that doesn't stop anyone else from posting their stuff.</p><p>Thanks for the detailed blogging, keep it up.</p><p>Patrick</p>
<p><i>[<a href='#c1260435064.812050'>link</a>]</i></p></div><div class='comment-b' id='c1260435244.9035'><p class='from'>From: <a href='http://lukego.livejournal.com/'>Luke Gorrie</a> (Dec 10 2009, at 00:54)</p><p>I think language implementers should think seriously about using disjoint heaps for each thread and copying in between. This is just such an easy way to give programmers a predictable performance model and avoid these voodoo frustrations.</p><p>I've been to a few research talks about Java GC and each time my take-away has been "wow, they made the problem much harder than the Erlang guys."</p>
<p><i>[<a href='#c1260435244.9035'>link</a>]</i></p></div><div class='comment-a' id='c1260460834.352334'><p class='from'>From: <a href='http://github.com/pjt'>Perry Trolard</a> (Dec 10 2009, at 08:00)</p><p>&gt; why can&#8217;t you run a map on a map and get a map?!?</p><p>Here's the idiom, offered in the spirit of "let me share" not "why didn't you know":</p><p>  (into {} (map (fn [[k v]] (do-something k v)) my-hash-map)</p><p>The (into {} ...) is the trick. The map function works on sequences, &amp; will coerce anything it can into a sequence, which in the case of (hash) maps is a sequence of [key value] pairs. The (fn ...) operates on these pairs, returning another, modified pair for each, so the return value of the map function is the sequence</p><p>  ( [k v] [k v] [k v] [k v] ... )</p><p>So you've processed the original map as a sequence (here's the unifying sequence abstraction) and want to turn the thing back into a map: enter "into", which successively adds (with "conj") items from a sequence into a supplied datastructure. And</p><p>  (conj {} [:key "value"]) =&gt; {:key "value}</p><p>(into, BTW, features the recently added transient optimization [<a href='http://clojure.org/transients'>http://clojure.org/transients</a>])</p><p>I've really enjoyed your series on Clojure. I'm a longtime ongoing reader &amp; a sometime Clojure enthusiast, so I'm happy for the intersection.</p>
<p><i>[<a href='#c1260460834.352334'>link</a>]</i></p></div><div class='comment-b' id='c1260481485.146970'><p class='from'>From: <a href='http://formpluslogic.blogspot.com'>Brenton Ashworth</a> (Dec 10 2009, at 13:44)</p><p>I love this project and can't wait to see what your final conclusions are.</p><p>Even though I don't completely understand everything that's going on in the proc-line function, I have a suggestion for how to clean it up. In your post "Idiomatic Clojure" you had a version of the batch-pmap-wide-finder code that you liked with functions tally, count-lines and find-widely. I am doing some log processing (not Wide Finder 2) and used this as a template to get started. Here are the good parts:</p><p>(def access-log-re #"[\"]GET\s(.*)\sHTTP/\d[.]\d[\"]\s(\d\d\d)\s(\d*)")</p><p>(defn tally [line]</p><p>  (if-let [[_ hit code bytes] (re-find access-log-re line)]</p><p>    {:lines 1</p><p>     :total 1</p><p>     (keyword code) 1</p><p>     :hits {hit 1}}</p><p>    {:lines 1}))</p><p>(defn count-lines</p><p>  [lines]</p><p>  (apply deep-merge-with + (map tally lines)))</p><p>(defn log-file-lines [filename]</p><p>  (line-seq (reader filename)))</p><p>(defn process-log-file [filename]</p><p>  (count-lines (log-file-lines filename)))</p><p>(defn log-files [dir]</p><p>  (map #(.getAbsolutePath %) </p><p>       (filter #(.startsWith (.getName %) *file-prefix*) </p><p>               (file-seq (java.io.File. dir)))))</p><p>(defn process [project]</p><p>  (apply deep-merge-with + </p><p>                         (map process-log-file </p><p>                              (log-files (str *root-dir* project "/server/")))))</p><p>In tally I use the regular expression to both find what I am looking for and also do a destructuring bind to extract the fields. I then create a nested map that has the stats for a single line.</p><p>In count-lines and process I use deep-merge-with from clojure.contrib.map-utils to merge the maps into a single map. My process function is merging results from multiple files but you can do the same type of thing in merging the results from multiple threads.</p><p>I'm not sure how this would effect performance.</p>
<p><i>[<a href='#c1260481485.146970'>link</a>]</i></p></div><div class='comment-a' id='c1260495693.261229'><p class='from'>From: <a href='http://www.cse.unsw.edu.au/~davidt/'>David Terei</a> (Dec 10 2009, at 17:41)</p><p>Would someone be able to point me in the direction of the sample data set? I would like to give Wide Finder a try.</p>
<p><i>[<a href='#c1260495693.261229'>link</a>]</i></p></div><div class='comment-b' id='c1260576481.375258'><p class='from'>From: <a href='http://'>Jim Robinson</a> (Dec 11 2009, at 16:08)</p><p>I'd be very curious whether or not</p><p>pinning the memory size using both</p><p>the -Xms and -Xmx parameters set to</p><p>the same large value affects the</p><p>performance.  You mention the cost of</p><p>repeated reallocs to grow the heap,</p><p>and so what happens if it just needs</p><p>to perform one large malloc?</p><p>Jim Robinson</p>
<p><i>[<a href='#c1260576481.375258'>link</a>]</i></p></div></div></div>

<div id='rightcontent'><div class='oo'><a id='to-home' href='https://www.tbray.org/ongoing/'><span id='home'>ongoing</span></a></div>
<div>
<div class='principles'>
<a href='/ongoing/WhatItIs'>What this is</a> &#xb7;
<a href='/ongoing/ongoing.atom'><img title="Subscribe to ongoing" alt="Subscribe to ongoing" src="/ongoing/Feed.png"/></a><br/>
<a href='/ongoing/Truth'>Truth</a> &#xb7;
<a href='/ongoing/Biz'>Biz</a> &#xb7;
<a href='/ongoing/Tech'>Tech</a></div>
<a href='/ongoing/misc/Tim'>author</a> &#xb7;
<a href='http://www.textuality.com/BillBray/'>Dad</a><br/>
<a href='/ongoing/misc/Colophon'>colophon</a> &#xb7;
<a href='/ongoing/misc/Copyright'>rights</a>
</div>
<div id='potd'><a id='tnA' href='/ongoing/goto-potd/'><img id='tnI' src='/ongoing/potd.png' alt='picture of the day' /></a></div>
<div id='cats'>
<a href='/ongoing/When/200x/2009/12/'>December</a> <a href='/ongoing/When/200x/2009/12/08/'>08</a>, <a href='/ongoing/When/200x/2009/'>2009</a><br/> &#xb7; <a href='/ongoing/What/Technology'>Technology</a><span class='more'> (90 fragments)</span>
<br/> &#xb7; &#xb7; <a href='/ongoing/What/Technology/Concurrency'>Concurrency</a><span class='more'> (76 more)</span>
</div>

<div class="employ">
<p>By <a rel="author" href="/ongoing/misc/Tim">Tim Bray</a>.</p>
<p>The opinions expressed here <br/>
are my own, and no other party<br/>
necessarily agrees with them.</p>
<p>A full disclosure of my<br/>
professional interests is<br/> 
on the <a href='/ongoing/misc/Tim'>author</a> page.</p>
<p>Iâ€™m on <a rel="me" href="https://cosocial.ca/@timbray">Mastodon</a>!</p>
</div>



</div>
</div>
</div>

</body>
</html>