Language Fermentation

If you're a programmer, I think you're lucky, because this is an exciting time we're living in: there's some powerful intellectual ferment in progress out there. This may be the golden age of programming, as Paul Graham argues, and maybe everything we thought we knew about strong typing is wrong. Herewith a bit of surveyware with a touch of debunking and a practical footnote on Java exception handling. (Warning: geeky.) (Update May 9, good feedback on Java Exceptions.)

I've plugged Mr. Graham enough times here recently, but two other very thought-provoking essays crossed my radar in recent weeks; Strong Typing vs. Strong Testing from Bruce Eckel, and Are Dynamic Languages Going to Replace Static Languages? by Robert C. Martin; both of these guys are reasonably famous author/guru types and they know their stuff.

They're saying more or less the same thing: the safety and robustness that strong typing bought us is accomplished way more effectively by good modern testing practices, and given that strong typing is a serious pain in the ass, maybe we're going all going to end up using dynamically-typed languages like Python.

I'm not going to replicate their arguments, just provide some cheerleading for a couple of them, and provide some pushback where required.

Rah-Rah on that Testing Thing · In case you hadn't noticed, the most widespread result of the late-nineties fad for Extreme Programming was the notion of Test-Driven Development; many have written about this and there's an excellent book of the same title by Kent Beck.

As these guys say, once you start doing this, it is wholly addictive; I have only recently done this and JUnit is now part of my life.

At Antarctica, I wrote the bulk of the initial server code in a one-man frenzy in my basement during 1999, not using a Test-Driven approach, and while I'm proud of some of that code, it would have been better (and, more important, done sooner) if I'd gone TDD. In our ongoing development, we're disciplined and have good QA practices but I think we need to get more militant about this.

Brain Quarters · Right at the moment, my two programming lives are in stark contrast. At work in recent months I've mostly been wrangling incoming customer data and mashing out huge perl/mysql script suites to prep it for loading into Visual Net. I've had a lot of practice at this and I can make it happen fast, but let's not pretend this is Software Engineering or anything like it.

In my spare time, which mostly means after 11 or so when the family has gone to bed, I'm tinkering with a solution to Java's string and character handling being basically wrong.

This latter project is done from the ground up rigorously TDD to the max, and like they said, and like I already said, this is some seriously addictive stuff. I'm enjoying both of my programming lives, but whichever half of my brain it is that programming uses has I suspect split into two mutually incompatible quarters that aren't talking to each other.

Dynamic Languages, Yes · These often used to be referred to as “scripting languages,” except for there are lots of big complicated unscriptlike systems built in them, or as “interpreted languages” only this is just wrong; modern Perl and Python and Lisp programs are compiled all right, it's just that it's done at runtime.

So I'm fine with the “dynamic languages” monicker, because maybe the most important characteristic is that variables are whatever you need them to be, and the language doesn't try very hard to protect you from your own stupidity on the basis of type declarations.

Most people who spent much time in the low-level trenches writing C code embraced the strong-typing anality of Java and its ilk, because it does cut off a lot of stupid errors at the pass.

But, it turns out, Test-Driven Development cuts them off better; much better. And there is no doubt whatever that you can create software a lot faster, maybe an order of magnitude faster, in high-level dynamic languages.

C, C++, Java, C#, R.I.P.? · Thus the big question: if the strong-typing advantages of conventional compiled programming languages are moot, do we really need them? In 2020, will everyone be a Python programmer?

No Free Lunch · Maybe, but maybe not. The languages in the “R.I.P.” list above do have some other advantages beyond strong typing. One of the big ones is memory footprint: if you're writing a big system with big complex in-memory data structures (which every big system I've ever worked on has had), their size can spiral out of control insanely fast in any of those dynamic languages. These things want you to build all your structures around hashes and dictionaries, which we all know perfectly well only work well when sparsely populated; work it out.

Maybe this objection is old-fashioned and provincial, after all I'm typing this on a laptop with a half-gig of RAM. But in my career it's been a constant that there's never as much RAM as you'd like, and I've been blocked on a few occasions from doing what I wanted, specifically in Perl, because of data structure explosion.

Secondly, and in the same spirit, there do remain performance issues. There are is some (small) number of people who have to write low-level webserver code, and if you've ever done this under the gun of a million-hits-a-day load, you quickly become a control freak with an insane desire to remove as many as possible of the layers which separate your code from the silicon.

Surprisingly, one consequence of the virtues of TDD might be a resurgence of C programming, you can't beat it for those low-level memory-starved scenarios. The core of the Visual Net server is in C, and while Java is more pleasant to code in (so much prettier, you know) the only thing that really irritates me about C at a deep level is having to declare all the variables at the front of the routine, rather than when they're needed.

So maybe in 2020 we'll all be programming in either Python or C.

Coda: Java Exceptions · In Eckel's essay above, he touches on the practice of using Java RuntimeException without really saying what he means. By a coincidence, I just learned this trick in recent weeks and it's also pretty addictive, so I thought it couldn't hurt to throw in a short example for the (probably small) number of people who don't know what he's talking about.

Suppose you're writing code to, as a completely random example, process UTF-8 efficiently in Java. Eventually you'll write something like this:

b = o.toString().getBytes("UTF8");

Then when you compile it, Java will whine at you that getBytes can throw a java.io.unsupportedEncodingException. At this point the Java programmer's heart starts to sink, envisioning every other module in the system that calls this sucker having to declare that exception, especially since there's very little likelihood that you can do anything about it except die, I mean what can you do if the system can't read UTF8?

Here's the trick:

try { b = o.toString().getBytes("UTF8"); }
catch (java.io.UnsupportedEncodingException e)
{ throw new RuntimeException("UTF8 not supported!?!?"); }

The trick, you see, is that RuntimeExceptions don't need to be declared, and all of the code you were thinking of writing just got a whole lot cleaner-looking.

As Eckel points out, legions of purists are now showing signs at needing the Heimlich maneuver merely at the sight of that code fragment, which flies in the face of everything that is Right, Pure, and Good.

Well yes, but if you need to catch the exception and handle it, or at least die gracefully, you still can. Just make your own subclass of RuntimeException and have the watcher-at-the-gate be looking for that one; because of its parentage it still doesn't need to be declared.

I sorely wish I'd known that years ago when I was writing the Lark XML parser, it grew Exception declarations like dandelions on a spring lawn, and they could have all been tidied away thusly.

On May 9th, Scott Lamb wrote me to point out that this method has the drawback that if the exception goes unhandled and you crap out, you've screwed up the stack trace and the original problem info. Fortunately, Java 1.4 (which I haven't got around to installing yet) has a new constructor for Throwable which takes a single Exception as an argument, this exception goes into a field called cause, and you don't suffer any information loss. I'll reproduce the salient part of Scott's note:

I need a different example to show why that's so important, though. Just the other day I was working on some code that parses a document, runs it through a couple of stylesheets, and uses a SAXResult to pump it through my own ContentHandler. I was getting a RuntimeException from deep within the Xerces XSLT code and had no idea why. Eventually I downloaded the source code and found the line in question. It was doing essentially what your code does when it got an exception from my ContentHandler. But I was completely unable to find the problem because the stack trace stopped where they had caught it and thrown a different exception. I ended up replacing a bit of code like this:

catch (SAXException e) { throw new RuntimeException(e.getMessage()); }

with one that looks like this:

catch (SAXException e) { throw new RuntimeException(e); }

and then it was easy. The stack trace went all the way to my code and I could pinpoint and fix the actual problem.

My new code required the J2SE 1.4, which was probably why they didn't do that to begin with. (In J2SE 1.4, any Throwable has an optional Throwable cause.) If you're not willing to do that, you could make your own subclass of RuntimeException with similar behavior.

ongoing

What this is ·

Truth · Biz · Tech

author · Dad
colophon · rights

May 08, 2003
· Technology (90 fragments)
· · Coding (98 more)

By Tim Bray.

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.

I’m on Mastodon!