This is the third progress report from the Wide Finder Project. Given that I launched this brouhaha late on a Friday, totally the worst possible time, I’m astounded at the intensity and quality of the conversation that’s going on. I want to address two themes that have emerged, one of which seems stupid and the other smart.

The Wrong Problem? · There’ve been comments and blogs along the lines of “WTF, trying to shake down Erlang using what amounts to a degenerate Awk script!!? This isn’t what Erlang is for! It’s all just I/O! Lines of text are so 1980’s!”

Nope; the further this goes, the more I think it’s a valid line of research. You know all those kazillions of Sun servers out there? Let me tell you, they’re not all running state-of-the-art Java-on-the-net apps. A huge proportion of all those cycles goes into Perl scripts and COBOL batches and C++ meat-grinders and FORTRAN number-crunching. Furthermore, if you look at the numbers from running my Perl and Ruby scripts, it’s obvious that they’re pegging both the CPU and I/O needles; so they’re nicely multidimensional in a simple way.

This is the kind of nasty unglamorous job that a lot of our customers run all the time to pay their rents, and this whole business is making a big bet on many-core computers, and it’s just not on to tell all those people that those are the wrong kind of jobs for the new iron.

The Good News · I first saw this on the Erlang Questions mailing list, from someone whose handle was Patrick: “The good news is speeding up the i/o in erlang should be easier than introducing better concurrency to another language.” Then there’s Patrick (the some one?) Logan saying the same thing in Postmodern I/O.


In fact, there are practical suggestions on how exactly to do that, from Clayton Wheeler and Steve Vinoski. (What is he up to at the new startup?).

Not only will I kick the tires, I’ll try the code out on our unreleased T2-based many-core/many-thread monster. This is going to be fun.


Comment feed for ongoing:Comments feed

From: Asbjørn Ulsberg (Sep 24 2007, at 03:46)

I'm looking forward to seeing the results of your tire-kicking. This Erlang diving of yours has been quite an interesting endeavour to watch so far.


From: David Chase (Sep 24 2007, at 04:39)

Vinoski's trick is the correct one; split first, then run ahead to a newline, so that you always split on a newline. Doing a little extra work here converts your serial problem into a parallel one. In the future, people will need to become much more comfortable with this trick and others like it.

Years ago, I worked for an itty-bitty startup called NaturalBridge, doing a mostly-static, user-level-MP-threads VM for Java. PURE Java parallelizes very nicely; mix in native threads, and it only runs in parallel as well as the underlying native language. (We ran Volano, for grins, up to 800 rooms on a 2-processor Pentium III box. That's 32Kthreads. All the stack management necessary to let that happen without stupid extra flags, is hidden in the VM).

I work on Fortress now; our plan, such as it is, is to make the world safe(r) for kilo-core boxes (among other things -- we have big plans).


From: Anonymous (Sep 24 2007, at 05:58)

Verivue is the name of his startup. At least that is what is written in the footnote of this IEEE article ;)


From: Toby DiPasquale (Sep 24 2007, at 06:37)

Tim, it was the same Patrick in both places ;-)


From: Anders Pearson (Sep 24 2007, at 09:25)

I don't think it's quite fair to characterize the feedback (that I've seen at least) as saying “WTF, trying to shake down Erlang using what amounts to a degenerate Awk script!!? This isn’t what Erlang is for! It’s all just I/O! Lines of text are so 1980’s!”

I haven't seen anyone arguing that text processing and I/O aren't important; just that other languages already do a really good job of handling those kinds of tasks so Erlang focuses elsewhere on the areas that those other languages don't do as well and makes it relatively easy (via ports) to integrate with them. That seems like a very reasonable and uncontroversial approach. We don't expect to be able to write device drivers in shell script. Maybe we shouldn't expect to do intensive text processing in Erlang.


From: Anders N (Sep 24 2007, at 10:08)

Some everybody has gotten very fixated on the file IO, instead of the string handling.

The string:tokens function that is used in Your example is NOT VERY efficient. So I rewrote the parsing to not use any library functions.

On my laptop with a file with 100K lines, Your published version takes ~7 s.

My version takes ~1 s.

I then added the parallel stuff by Steve Vinoski.

Using 2 processes on my core 2 duo laptop, takes ~ 0.8s.

Finally native compilation, two process version gives ~0.7s for 100K lines.

Single process version do_file(File).

Multiprocess version p_do_file(NoProc, File).


do_file(F) ->

{ok,B} = file:read_file(F),


fold_lines(fun (URI,Acc) ->

process_match1(URI) + Acc

end, S, 0).

p_do_file(Num, Input) ->

{ok, Data} = file:read_file(Input),

Bins = split_on_newline(Data, size(Data) div Num),

Me = self(),

Pids = [process_binary(Me, B) || B <- Bins],


fun(_, Total) -> receive X -> Total + X end end,

0, Pids).

split_on_newline(Bin, N, All) when size(Bin) < N ->

All ++ [Bin];

split_on_newline(Bin, N, All) ->

{_, <<C:8, _/binary>>} = split_binary(Bin, N),

case C of

$\n ->

{B21, B22} = split_binary(Bin, N+1),

split_on_newline(B22, N, All ++ [B21]);

_ -> split_on_newline(Bin, N+1, All)


split_on_newline(Bin, N) when N == size(Bin) -> [Bin];

split_on_newline(Bin, N) -> split_on_newline(Bin, N, []).

process_binary(Pid, Bin) ->

spawn(fun() ->


V = fold_lines(fun (URI,Acc) ->

process_match1(URI) + Acc

end, S, 0),

Pid ! V


fold_lines(_F,[],Acc) ->


fold_lines(F,S,Acc) ->

{R,C} = do_line(S),

fold_lines(F, C, F(R,Acc)).

do_line(S) ->

C1 = drop1(S,6),


drop1([$ |Cs], N) when N>1 ->

drop1(Cs, N-1);

drop1([$ |Cs], 1) ->


drop1([_C|Cs], N) ->


getit(Cs) ->

getit(Cs, []).

getit([$ |Cs], Acc) ->

{lists:reverse(Acc), drop_rest(Cs)};

getit([C|Cs], Acc) ->

getit(Cs, [C|Acc]).

drop_rest([$\n|Cs]) ->


drop_rest([_C|Cs]) ->


drop_rest([]) ->



From: Isaac Gouy (Sep 24 2007, at 11:12)

"The Wrong Problem?"

No, but people always seem to read these single task exercises as a definitive characterization of a language.

As well as noting how much faster Ruby can line-by-line read and sum from a text file

As well as noting how much faster Ruby can do text regex

We should note how much slower Ruby is on other tasks


From: Isaac Gouy (Sep 24 2007, at 11:44)

That's irritating - the URI text works fine but the URI link with & replaced by &amp; doesn't work.


From: robert young (Sep 24 2007, at 13:29)

Some syntax notes:

The tail recursion and immutable variables are relics of Erlang's Prolog heritage; less or more depending on who you talk with. Tail recursion isn't goto in sheep's clothing; just a necessary syntax to using recursion in compile time definable space. Languages which depend on recurion (always?? often??) depend on tail to keep from blowing stack. The immutable variable thingee is less obviously necessary; I guess I'll have to read his book. It is a logical necessity in Prolog; Erlangistas tell us that Erlang is not a logic language.

The I/O cavil is apt; on both sides. From your point of view, a common requirement. From their point of view, not common in Erlang's normal space. OTOH, if I/O means sockets, then multi-thread/core concurrency falls out kind of naturally. Less so for processing from a serial device. In that instance, the programmer has to divy up the work manually; as demonstrated by the correspondents.

As to whether Erlang is really The Next Big Thing; may be not. Two pieces of technology may merge to be The Next Big Thing. 1) relational database engines are inherently parallel, and the folks who've been writing them for 30 years are likely the most skilled at dealing with such a data structure. 2) Linus Torvalds recently mused that memory drives in signficant density are not that far away.

Taken together then, for garden variety development, RDBMS could now do all the heavy lifting really, really fast once the ball and chain of rotational storage is gone. Programmers could really do MDA; "what not how" as Chris Date says, leaving only some trivial UI stuff to be coded. yuk-yuk.


From: Chad Brewbaker (Sep 24 2007, at 23:05)

MPI_IO is what you need. I'll make myself a note to code Wide Finder in mpiruby tomarrow. (Looks like the official release date will be on October 9 for those that were interested)

This is my current syntax:


I'll probably add a method to the String class called String.galign("bound",*communicator) that will take an string split across processors and align the boundaries to "bound". I already have a routine that splits ">" for FASTA files.


From: Masklinn (Sep 25 2007, at 07:28)

> The tail recursion and immutable variables are relics of Erlang's Prolog heritage

Wrong, they're characteristics that come from the functional basis of sequential erlang (the part of the language that runs within each thread)

> It is a logical necessity in Prolog

Immutable values also removes a whole class of errors from functional programs (and programs in general), it also allows you to optimize message passing by sharing instances: immutable objects are "thread safe" by default, so message passing between processes can be optimized to sharing structures in a thread-like manner and merely sending a reference.

Recursively iterating is then made necessary by the mandatory immutability of all objects (since you can't increment a counter), and tail recursion is the only way to recurse in constant space.

> relational database engines are inherently parallel,

So is mnesia. And it's natively distributed and replicated to boot.


From: Akhilesh Mritunjai (Sep 25 2007, at 15:03)


Here is the asymptotic maximum JVM regexp performance (without I/O) for the regexp given in Part-I:

Result: time: 29212ms. 1003 M/s, 100000000 matches total

Java 6u2, WinXP, 1.6GHz Pentium-M single core

So, a half decent regexp library still will be IO bound as most HDDs are capable of around 80MB/s.

It is pointless to parallelize this problem unless the regexp library is *so* bad that it can't sustain even one-twentieth of Java performance as on a 3 year old CPU.

If you really want to do that fast, I'd suggest using an x4500 (Thumper) which can sustain around 2GB/s (peak) from its disks and use just use a single awk script (or your favorite code on the 4 Opteron cores on the x4500).


From: robert young (Sep 26 2007, at 09:11)

>> relational database engines are inherently parallel,

> So is mnesia. And it's natively distributed and replicated to boot.

well, but it's not a Relational database (not much more than an in-memory filesystem); and therefore can't do any of the heavy lifting. you're right back with application ACID, just like a 1965 COBOL programmer.


From: robert young (Sep 27 2007, at 05:54)

... and then there's ErlyWeb, which does use a real RDBMS to do the heavy lifting; even to the point of generating UI from the catalog. Yariv is one smart young man.


author · Dad
colophon · rights
picture of the day
September 23, 2007
· Technology (90 fragments)
· · Concurrency (75 more)
· · Erlang (4 more)

By .

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.

I’m on Mastodon!