This thread starts with Bill de hÓra’s excellent Design for the web, which has useful links and commentary about problems with Java web frameworks. Coté follows up with Java’s Fear of Commitment, also very good, and interesting discourse breaks out in both parties’ comments. Obviously, Bill and Coté are correct; embracing the Web is going to get you a better result on the Web than not embracing the Web. If you want more evidence, look no further than PHP, a deeply-flawed tool whose success is based on (admirably well-done) Web-centricity. Normally, I’d leave it at that, but Coté is wrong about Java and Bill is wrong about both ETags and MVC, and I think all of those things are important enough to push back on.

ETags · Bill highlights the ETags support in Rails and Django as evidence in his argument that embracing the Web is a good thing. That’s true insofar as Rails and Django at least recognize that ETags exist; but in fact the default implementations are very nearly useless.

What do ETags do? They allow you, when you fetch a Web resource, to send a short string along with the request saying “this is the signature you sent me for the version I know about”; then the server can look and see if the signature is still good, and if so not bother sending the data redundantly. Which is potentially a big, big win.

But also potentially not. If you look at actual real-world servers that are maxed out, they’re mostly maxing out on computation or back-end database traffic, not on bandwidth to the Net. So the saving that would be really valuable would be if you didn’t have to rebuild the requested page. Unfortunately, both Rails and Django rebuild the requested page, then compute the signature, then check the ETag. Which is extra run-time work in order to conserve a resource (Web bandwidth) that probably isn’t the problem.

What you want to do is compute the ETag based on the underlying data resources that actually drive the page creation; the input to that process, not its output. This is often going to be a small number (sometimes one) of timestamp or version fields in a database row, or metadata from the underlying filesystem. It’s also going to be application-dependent. So a framework that was really designed for the Web would expose the ETag generation to the application programmer in a way that let them be smart about conserving the resources that actually matter.

For example, when the Apache webserver accesses a resource which is a static file, the ETag is based on its inode number, size, and last-modified time. Which you get almost for free from the OS without even having to open the file.

MVC · While I’m beating up on Bill (and once again, let me emphasize that his central thesis is right-on), over in Coté’s comments he praises Django, saying “it does not follow MVC. MVC is for desktop apps not web sites.” Well, I hear gripes about Rails, but they typically aren’t about its MVC-ness; in fact I mostly hear warm-fuzzies about that, just because it’s a maintainability boost, helping you find the code you need to hack to kill a bug or build a feature. So on this issue, I don’t think the evidence is with Bill.

Java · Coté invests multiple sarcasm-laced paragraphs sneering at Java’s insistence on making as much of the underlying infrastructure swappable as is reasonable. Then he backs off and says “There are some things that it works well for in Java — file systems, sorting algorithms, collections, and all sorts ‘low level’ API things.”

Well, d’oh. And in fact Ruby and Perl and Python and PHP let you swap out pretty well all the same stuff. WORA isn’t just for Java any more, and WORA is a very very good thing indeed. Even the bloody C code I’m wrestling with for mod_atom already runs fine on OS X and Linux and Solaris, and will probably be easy to get going on Windows, because of Apache APR; you know, swappable infrastructure stuff.

The problem with those Java frameworks isn’t about swapping or abstraction, it’s that their designers couldn’t bring themselves to embrace the Web as it is. BTW, check out the recent EE work and the REST JSR and Grails. The Java community has noticed.

Anyhow, like I said, Coté and Bill both get the answer right and their pieces are worth reading. Which is why I think it worthwhile to point out missteps along the path.



Contributions

Comment feed for ongoing:Comments feed

From: Dominic Mitchell (Jul 31 2007, at 23:24)

The swappable infrastructure stuff seems to be a thing of the past. People seem to have started to heed YAGNI these days.

[link]

From: Robert Sayre (Aug 01 2007, at 00:11)

Your analysis of the ETag implementations in Rails and Django assumes that clients and caches are going to revalidate every time. They don't. The rest of the section accurately describes how to make revalidation cheaper, which is indeed very useful if stale data is unacceptable.

The section on MVC misses Bill's point. No one complains about what Rails calls MVC, but it isn't MVC. In desktop MVC, the view is updated when the model changes, without interference from the controller. This is difficult to over HTTP, because the server can't initiate traffic.

[link]

From: Simon Willison (Aug 01 2007, at 00:15)

Django is MVC in exactly the same way as Rails - there's an ORM (the model), code that executes in response to an HTTP request (a "controller" if you like) and templates (views).

We used to avoid describing Django as MVC because it simply didn't mesh with our understanding of the MVC design pattern. In classic MVC, the controller is this magic piece of code that keeps the view and model in sync. The way the term controller is used for Web frameworks just didn't seem right to us.

Also, a template system didn't make sense to us as a "view" since there are plenty of situations (dynamic image generation for example) where you create output without calling the template system at all.

So we gave the name "view functions" to Django's "controller" layer and started describing Django as MTV - Model Template View.

Of course that just led to people criticising Django as "not MVC", which was ridiculous. So these days we tend to just call it MVC and put up with the discomfort.

[link]

From: John O'Shea (Aug 01 2007, at 00:45)

Tim,

It is worth mentioning the important role ETags play when used to detect lost updates - see Detecting the Lost Update Problem Using Unreserved Checkout (http://www.w3.org/1999/04/Editing/)

John.

[link]

From: Julian Reschke (Aug 01 2007, at 01:42)

As far as I remember, the ETag computation in Apache httpd usually does not use the inode (I think because it may not be available to APR in all filesystems). By default, it just takes last-modified and file length, which is a problem if a resource is updated frequently (and that's why Apache produces weak etags for a small period of time, which causes lots of headaches with clients doing PUT and trusting the ETag in the response).

[link]

From: Aristotle Pagaltzis (Aug 01 2007, at 02:03)

Andy Wardley argues the same point as Bill and Simon in MVC: No Silver Bullet, but at more length. Maybe that will get Tim onto the right trail.

[link]

From: Danny (Aug 01 2007, at 02:05)

So what would MVC look like in a system designed from scratch for the Web?

The control aspects would presumably be operations at the end of a chain from the HTTP calls. It seems like several popular frameworks have this.

Views would presumably be different representations of the resources the system managed - some kind of templating of the internal representation would seem to be in order. Again, as found in the frameworks.

But the internal representation, the Model should also share the key characteristics of the Web - resources identified with URIs, lots of linkage between representations. Do *any* of the frameworks go remotely near this (without layer upon layer of kludge)? The Web doesn't have a model based on objects and/or the relational model, so why build frameworks so tied to these abstractions? Might as well call COBOL & PASCAL dynamic languages too...

We do already have suitable models for truly Web-native systems: Atom (content-oriented) and RDF (generic data). Can't wait to see the frameworks start to use these in their cores.

[link]

From: Keith Gaughan (Aug 01 2007, at 05:52)

Dominic: That's because, with languages like Ruby, Python, PHP, &c., and portable runtimes like APR, it's something that we don't have to worry so much about.

[link]

From: James Abley (Aug 01 2007, at 06:43)

You've been dozing on that one, Tim. I highlighted the ETag CPU overhead in my comment on Bill's post, which also links to Joe Gregorio's great post about Deep ETags. I also commented on Java approaches for this. You may also wish to highlight some of the dangers of ETag support in Apache and IIS; it's not a free lunch, funnily enough.

[link]

From: Adrian Holovaty (Aug 01 2007, at 09:00)

Tim says: "What you want to do is compute the ETag based on the underlying data resources that actually drive the page creation..."

Ah, but there's so much more than "underlying data resources" that drive the page creation. There's the HTML template, for one. There's the business logic, for another.

Sure, we should take the data into account when calculating the ETag. But a change to the HTML template can have just as much of an effect on the page as a change to the underlying data.

Consider the cases of a site redesign, or just small design tweaks. Tim, are you saying that a site redesign does not merit a refresh of ETags? I can understand the philosophical purity of that viewpoint, but I don't agree with it. When I redesign any of the sites I run, I want people to see the results immediately, whether the underlying data resources have changed or not.

This is precisely why Django uses the entire rendered page to calculate an ETag: there are too many moving parts in the creation of a Web page.

Regarding the second point -- that having to compute the page in order to calculate the ETag is unnecessarily expensive -- I agree. Django's cache system works nicely here. It lets you cache entire pages, with their ETags, which means the ETag comparison can be done without having to generate the page from scratch.

[link]

From: Bill de hOra (Aug 01 2007, at 10:50)

Etags:

"You've been dozing on that one, Tim. I highlighted the ETag CPU overhead in my comment on Bill's post"

James, if you actually need etags and caching, I'm pretty sure you'll be I/O bound first. CPU is the last bound I'd worry about on a webserver. Trading CPU for I/O seems sensible to me.

View Model Template:

I did say last week that the pattern I mentioned is also called "MVC", but it's not *MVC*. I've been arguing down controllers for years in web frameworks, but Django was the first place I'd heard a group state that template belonged in the frame, namely as "Model, Template, View" (MTV). I just call it View Model Template to indicate the processing order and to make sure it doesn't sound like "MVC".

Model View Controller:

Still, MVC is motherhood and apple pie and people have a real problem seeing it criticized.

Simon Willison said as much when he says the Django folk gave up and starting saying "MVC". That's a real pity because the Django devs have gotten it 100% right. It's also the framework that expresses the idea most clearly (Zope, SpringMVC and Rails are less explicit imo).

Saying "MVC is not a good fit for the web", is not the same as saying "you shouldn't separate concerns". Perhaps this is what spooks people - MVC is a silver hammer for separating out presentation. Especially so I think in Java/.NET where there's a lot of emphasis on domain models.

The whole MVC on the web thing needs fuller treatment. I feel I should write it up, but here's the premise - MVC applies to the Web about as well as RPC does.

[link]

From: Jacob Fugal (Aug 01 2007, at 12:34)

"Sure, we should take the data into account when calculating the ETag. But a change to the HTML template can have just as much of an effect on the page as a change to the underlying data."

IMO, that's easy enough to handle.

I don't expect my templates to be changing underneath the application between deployments of the application (I've learned from hard experience the drawbacks of monkey patching a live application). So really what this argument comes to is that ETags should be refreshed when a new version of the application is deployed.

In order to do this, the ETag calculation can incorporate the "Application Version" in some manner. When you update the application -- including changes views -- you bump the application version number and your ETags will refresh.

[link]

From: pk11 (Aug 01 2007, at 16:58)

Hi Tim, Besides Grails I certainly would mention waffle (http://waffle.codehaus.org/) as a good java example.

[link]

From: Wes Felter (Aug 01 2007, at 18:41)

Templates are data. :-)

[link]

From: Michael Koziarski (Aug 01 2007, at 21:06)

While our default implementation may not save you server processing time, it's important not to forget the cost of sending that massive HTML page back to the user.

As someone who lives at the end of a relatively high-latency internet connection, the md5 based approach makes the best of a bad situation and applications which use it (highrise, etc) are noticably faster than those which aren't.

Deep Etags are essentially impossible for the framework to calculate, but if you're using rails you can always use the approach I outlined earlier: http://www.koziarski.net/archives/2007/5/28/clever-caching

Your application has to explicitly indicate what values make up its 'key', but once you've taken that step, you can get Etags and Memcache caching for free.

For applications which take the trouble to think about their data dependencies, this approach is about as near to 'perfect' as I've been able to find.

[link]

From: James Abley (Aug 02 2007, at 15:23)

Bill,

I could very well be wrong about where the bottleneck is - your position is that it would be I/O and I (and Tim apparently) think that it would be CPU. That's my gut feel and is no doubt a result of my experience in a restricted problem domain (and maybe some implementation decisions in solutions for those problems). I have been reading a recent meme along the lines of the impedance mismatch between quantum mechanics and general relativity and how there may be a similar situation for different scales of webapp (I can't find a link currently - been in hospital all day watching the wife give birth, so I'm a little tired now). YYMV for different problems. I haven't generated any data in this area, or even interpreted anything in that area. I <em>do</em> think it's useful to highlight the compromises that are being made and forces that may govern a given solution.

[link]

author · Dad · software · colophon · rights
picture of the day
July 31, 2007
· Technology (77 fragments)
· · Web (385 more)

By .

I am an employee
of Amazon.com, but
the opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.