Some Sundays I make graphs of statistics from the ongoing web-server log files. I find them interesting and maybe others will too, so this entry is now the charts’ permanent home. I’ll update from time to time.
[Updated: 2007/02/11] Wow, I hadn’t got around to an update since last October. There’s been lots of interesting action. I suspect the browser-share disturbance has to do with the launch of IE7. It may also be the case that the increasing popularity of some really non-technical search strings is pushing that traffic flow back towards the IE old guard. Paul Hoffman has been bugging me to graph the uptake of IE7 and I suppose I should.
I have no explanation for the gyrations in the search-engine graph; if they continue I’ll have to do some research.
The following graph is no longer interesting, showing a fairly steady number of feed fetches to my Atom feed and a residual flow of moronic poorly-programmed clients ignoring the permanent redirect and trying to fetch my old RSS feed. I don’t like discarding information, so I’m going to leave it frozen in its Autumn-2006 state.
What a “Hit” Means ·
to do some AJAX-y stuff.
XMLHttpRequest now issued by each page seems to be a
pretty reliable counter of the number of actual browsers with humans behind
them reading the pages. I checked against
and the numbers agreed to within a dozen or two on days with 5,000 to 10,000
page views; interestingly, Google Analytics was always 10 or 20 views
Anyhow, do not conclude that now I know how many people are reading whatever it is I write here; because I publish lots of short pieces that are all there in my RSS feed, and anyone reading my Atom feed gets the full content of everything. I and I have no #&*!$ idea how many people look at my feeds.
As a result of doing this, I turned off Google Analytics; they weren’t adding much that was of interest, are a little too intrusive and I think were slowing page loads.
Anyhow, I ran some detailed statistics on the traffic for Wednesday, February 8th, 2006.
|Total connections to the server
|Total successful GET transactions
|Total fetches of the RSS and Atom feeds
|Total GET transactions that actually fetched data (i.e. status code 200 as opposed to 304)
|Total GETs of actual ongoing pages (i.e. not CSS, js, or images)
|Actual human page-views
So, there you have it. Doing a bit of rounding, if you take the 180K transactions and subtract the 90K feed fetches and the 6000 actual human page views, you’re left with 84,000 or so “Web overhead” transactions, mostly stylesheets and graphics and so on. For every human who viewed a page, it was fetched almost twice again by various kinds of robots and non-browser automated agents.
It’s amazing that the whole thing works at all.
Source · A tarball of the scripts that generate this is here. It ain’t pretty.