I was poking around the Web today and in the space of a minute, ran across two completely different pictures that made me shake my head and think “That’s a lie.” Here they are. [Update: One of the parties reacted. Jump to the end of the article to read about it.] [Six months later, the other party got its hand slapped!]

This first one is from an ad on some tech website.

Lying Microsoft graphic

[Update, August 25: The UK Advertising Standards Authority has upheld a complaint against this sleazy, stupid ad. Very good indeed.]

The problem with this one should be obvious; they compare modern commodity PC hardware with mainframes, the world’s most expensive computing platforms, based on a Sixties architecture and with very few makers. (Maybe one... are there still “plug-compatible mainframes”?). And, surprise surprise, they discover that PCs are cheaper. I’d say that if you have to stretch this far to look cheaper, you’ve got a problem. Our Volkswagen diesel can get to the ski hill on less gas than the Hummer across the street, too.

The second graphic is a little subtler, and you may need to click-to-expand to see the details. It’s from the libxml2 Home Page.

Lying open-source graphic

What’s the problem, you ask? Anyone can look and see the relative performance of a bunch of XML parsers summarized; useful stuff. And most of it is; but there are some problems.

Let’s start with those yellow bars labeled “Overall” which are some sort of normalized average. I don’t know about you, but when I want to use an XML parser I worry very little about the average of its SAX speed and its XSLT transformation speed. That yellow bar is questionable. No, on second thought it’s meaningless. No, in fact it’s actively harmful, because I’ve heard intelligent people stand up and say “The numbers prove that libxml2 is faster than X,” where X is one of the other names on this chart. But that’s not true; which parser is fastest depends totally on what you’re actually doing.

And there’s another problem; when I first read this I wondered at the number for building a DOM with Expat; because, uh, Expat doesn’t build a DOM. It turns out if you follow the pointer to Sourceforge there are a whole lot of details behind this insanely-oversimplified graph. By the way, there’s a lot of really good work and analysis on that Sourceforge page.

But it turns out that, to build that Expat/DOM number, they took three different software packages that you can layer on top of Expat, averaged them out, and published that number, claiming it represented Expat’s DOM-building performance. This is bogus research technique and invalid statistical practice, and really needs to be cleaned up.

There may be more problems, but when I saw that, I threw up my hands and stopped looking. That first top-level graph, to be useful, should have omitted the vertical bars except where the package in question actually provided the function they were graphing. Alternatively, instead of one oversimplified graph they should go a little deeper and not hide the real results behind a pointer that looks like it leads to source code.

I should say in parting that from what I hear, libxml2 is a fine piece of software (if intimidatingly large) and among other things has excellent performance. This graphic does it no credit, and should be fixed or removed.

Daniel Veillard · He’s the guy who maintains libxml2, and in my experience a good person. He wrote back a little bitterly saying he’d removed the graphic; I answered and I think we understand each other. He didn’t do that graphic himself, which reinforces my good opinion of him. I think he should take away the problematic columns in the graph and put it back, but it’s his web page. It’s nice being part of a community.

Now, I wonder if Microsoft is going to pull their egregiously-misleading ad campaign?

author · Dad · software · colophon · rights

March 05, 2004
· The World (107 fragments)
· · Untruths

By .

I am an employee
of Amazon.com, but
the opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.