Over the last 24 hours I learned a lot about how the Web of A.D. 2003 works, and it's not like it used to be.

At about 3PM on Feb. 27th, I sent out emails to a few well-connected acquaintances announcing ongoing, and also put a pointer to it in my standard email signature. Then I started hovering over the access_log like an expectant mother.

You and Your access_log · I've only ever run web sites on Apache or one of its ancestors, and this lineage of web servers has always written its statistics into a file named access_log. I think anyone who's running a Web site, or who cares about the Web, ought to, on a regular basis, spend some time watching the access_log in real time. On unix-like systems, the command is:

tail -f access_log

Too often we get this image of the Web as a vast well-oiled machine, with glossy browser screens in front and masses of gleaming software in back. Watching the access_log is like a window into the side lobby of the legislature, or a tour of the fermentation vats at the brewery.

People who are Web-savvy and have spent years looking at access logs can skip a couple of paragraphs, and I'll get into some interesting statistics.

Here's a single line picked out of the access_log:

pool-151-203-239-239.bos.east.verizon.net - - [28/Feb/2003:10:42:13 -0800] "GET /ongoing/ HTTP/1.1" 200 12693 "http://www.scripting.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

By looking at this, proceeding from left to right, we can tell:

  1. This person is coming from Boston through an ISP that is either Verizon or connected to Verizon.
  2. They connected at 10:42 AM.
  3. They requested the ongoing home page, called /ongoing/.
  4. Their request was successful, that's what the 200 means.
  5. ongoing sent them 12,693 bytes.
  6. To get here, they followed a pointer from Scripting.com (this is called the "referer" and is not terribly reliable).
  7. They were running Microsoft Internet Explorer on a Windows XP box. Yes, IE identifies itself (in part) as Mozilla; the roots of this are historical, the original Netscape identified itself as Mozilla and at one point it was the only browser that could do images and tables, so lots of servers would send ultra-primitive versions of their sites to browsers that didn't claim to be Mozilla. So when IE came along, it had to claim to be Mozilla, and now it can't stop because there are lots of servers expecting IE to claim to be Mozilla. There you go.

The Word Spreads · As I said, I sent the emails out at around 3PM. The first external visitor was Jon Udell of Infoworld, who showed up at 3:24 PM.

By 4:19 PM, others were starting to trickle in, although the first to show up with an obvious link from Jon's blog wasn't till just past 7 PM.

Almost at the same time, the first of the RSS feed readers showed up.

At 8 PM, there was a visitor from Google, but it was a real person on a Linux box, not a robot.

By 9 PM, there were visitors from Denmark, Singapore, and Australia.

Just past 11 PM, the Inktomi robot showed up. Google's robot put in its first appearance at 6:34 Friday morning.

By about 10 AM, three or four bloggers had put in pointers, and the traffic was flowing in.

At about 1:30 PM, ongoing had had a thousand or so unique visitors.

Humans and Others · There are three kinds of visitors to a web site: humans, robots, and RSS readers. Of the thousand or so unique visitors, 14 were robots of one kind or another: Google, Inktomi, Verity, research.att.com, and some others I didn't recognize.

214 of them were RSS readers of one kind or another, with the usual suspects well-represented: NetNewsWire, Radio Userland, Syndirella, Amphetadesk, and a whole lot of home-cooked readers.

Anomalies · I'm not going to invest the time in running the numbers properly, but a couple of glaring anomalies emerge:

  • Macintoshes are over-represented; about 18% of all browser requests. Particularly in the early hours, before the word really got out, the proportion of Macs was very high.
  • Internet Explorer is under-represented: 22% of Windows requests came from non-IE browsers, mostly Mozilla with a bit of Netscape and Opera.

I guess this says something about the kinds of people who watch RSS feeds and check out the new weblogs.

The Vast Sucking Sound · When you're watching the access_log, what's really remarkable is the steady pounding from the RSS aggregators. ongoing has been up a day and a bit, and as I watch this, I'm seeing maybe four or five hits a minute on the RSS feed.

When you consider the number of sites out there with with RSS feeds, and the number of people who subscribe to a bunch of them, we're talking some pretty serious traffic here. Architecturally, this seems pretty dumb, and you have to worry whether or not it's going to scale. On the other hand, the architecture of the whole Web is pretty simple, more or less built to be as simple as possible without breaking. I wrote a note on this back in January, but now I have direct personal experience, and yes, Houston, we (potentially) have a problem.


author · Dad · software · colophon · rights
picture of the day
February 28, 2003
· Technology (77 fragments)
· · Web (385 more)

By .

I am an employee
of Amazon.com, but
the opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.