By far the biggest names in the referrer-log are Radio Userland and NetNewsWire, with the other aggregators (Syndirella, Amphetadesk, and so on) well up in the list. Which is just fine, but we're missing a piece of information which will soon be really important, commercially. (Updated 'round midnight Sunday. And again Monday, more input.)

Eventually there will be business models built around weblogs, with more popular ones being more lucrative. And while the Pagerank-style ratings produced by Technorati, Daypop and so on are important, the big question is going to become: “how many subscribers do you have?”

And right now, it's hard work to answer that question. You can count the number of different IP addresses fetching your RSS feed, but this mis-counts for a bunch of reasons:

  • People move around: I, for example, show up from two different IP addresses every day, and then I travel a lot.
  • All the people in big private networks like Microsoft and AOL appear to come from a very small number of IP addresses.

I think the pressure to come up with a good, auditable answer to this question is going to become irresistible down the road. And it would be technically easy to achieve. Here's one strawman that I'm pretty sure that would work, and since I just thought it up in the last ten minutes, there are probably better ideas lurking out there.

When someone sets up their aggregator, they provide their email address, on the understanding that it will never be sent anywhere. However, it is used to generate a 64-bit hashcode using a widely-published algorithm provided along with reference source code. Then, when the aggregator pulls in the RSS file, it sends something like this in the Referer header:"3a892d0a81224ff4"

When you count the number of unique hashcodes fetching your RSS feed, you'll know exactly how many subscribers you have. The other neat thing is, when people like A.C. Nielsen and the Audit Bureau of Circulation want to get into this business down the road, they'll ask the people in their samples to provide their email addresses, so they can compute the hash and do demographic slicing on subscribers to, well, anything.

Because one way or another, RSS subscriptions are going to be big damn business just like every other kind of subscription, and one way or another, it's going to become necessary to get this data. So why not make it easy, cheap, and accurate?

Update: Brent & MNot · Brent Simmons, author of NetNewsWire, writes to point out that the Referer field is probably the wrong one (RSS readers have been accused of “Referer spam” in the past) and adds that it could go in the User-Agent. Off the top, looks like he's right and he's right. If you wanted to put it in the User-Agent, you'd need a conventional way to set it off from the unregulated cruft that lives there, I would suggest <eH>3a892d0a81224ff4</eH> or some such.

Mark Nottingham also points out the problems with the Referer header, and suggests using the existing HTTP From header.

Which is an elegant idea, but I don't think it's quite as good, because it doesn't end up in typical webserver log files by default. And Brent's trick of using the User-Agent is just fine I'd say.

Mark also tries to object to the notion that RSS needs a business model. He's entitled to his objections, but it's the other way round: there are a lot of business models out there that need RSS.

Update: Aaron and Ross · Ross Olson wrote in to point out that you could track your RSS subscribers with cookies. Yes, kind of, only you'll have as many cookies as the count of browsers and computers you use, most people will have more than one I'd think. And also, cookies feel like way too much intrusion and machinery for something that should be simple. Also, they make the web-server do more work.

Aaron Swartz points out that if, whenever you provide the URI of your RSS feed (whether via autodiscovery or behind the little orange “XML” button), you stick in a random number, e.g., that would do the trick. Of course, for those of us who've already got lots of people out there using the base URI for our RSS, it's too late. And, just as with cookies, someone who uses two aggregators on one computer and one on another is going to get counted three times.

And I'm uncomfortable architecturally with this; I think that you really ought to be able to use pretty well any URI you can dream up for your RSS feed, and whatever syntax you use, this feels like a landgrab on a piece of that space, which is questionable.

Aaron also points out that my idea of hashing the email address would be seen by some (e.g. him) as a security hole. I'm unconvinced on that one, as of yet.

The only thing I'm convinced of is that when the world of business comes to treat RSS subscriptions as seriously as they treat other kinds of subscriptions (which is very seriously), they are going to demand some sort of measurement capability as a basic price of admission.

author · Dad · software · colophon · rights
picture of the day
May 25, 2003
· Technology (77 fragments)
· · Web (385 more)

By .

I am an employee
of, but
the opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.