ongoing by Tim Bray · Referral Information Loss

Late Sunday I published Ten Theses on Tablets; it picked up a few high-profile links and referrals and went mildly viral and as of now has been read (in a browser as opposed to a feed reader) 13,911 times. Who do you think might have sent those people?

I wondered that myself, so I ran my referers script, and was a little puzzled because there were a few from Hacker News and TechMeme, but not that many. I dug a little deeper and, as of now, 11,721 of those fetches had an empty “referer” field in the log-file. Who do you think might have sent those people?

I’ve now tested a couple of hypotheses, and I’m pretty sure the answer is: Twitter clients. At least, I’ve verified that for two different Twitter clients, one on Mac and one on Android, when they wake up a browser to follow a link, there’s no referer.

This could be because the URL shorteners are getting in the way. It could be because the API you use to send the default browser after a URL doesn’t have a slot for “referer”. I could investigate, but I bet someone who’s reading this knows.

This is a problem. If someone follows a link in one of my tweets, I think whoever owns that URL is owed the information that they came from http://twitter.com/timbray. It’s not that there’s no referer; it’s that the information is being discarded. Can we get this fixed?

Contributions

Comment feed for ongoing:

From: Lance Fisher (Oct 13 2010, at 00:04)

I'm 13,912, and I came here from TweetDeck. @kzu RTed @timbray (that's you!). fyi.

[link]

From: Andrew Ducker (Oct 13 2010, at 01:12)

Surely the problem is that most bits of desktop software just launch a URL and expect the OS to work out what piece of software will handle that?

I'm not sure how you solve a problem like that.

[link]

From: IBBoard (Oct 13 2010, at 01:53)

Is there a referrer in that case, though? Twitter can show tweets as web pages, which would provide a referrer via the normal browser mechanisms, but the API provides data and the web interface is just a view on that data. Similarly, the client apps are just a view on that data, albeit one that can kick off a completely separate browser session.

I guess there could be a convention that Twitter clients make up a matching URL to what the website would use, but that is still feeding you false information (since that would imply that people came from the web interface, not as a new web browser request).

As far as I know, the data itself doesn't have to have a website URI in it (it could have some form of URI for identification, but it could be a data URI instead of a web address), and so your interaction between the link in the message data and the website that you link to is different between web to web and client to web.

[link]

From: Wayne Baisley (Oct 13 2010, at 03:48)

FWIW, I got here from your link via TweetDeck thence to Mobile Safari from an iPhone, around 3:34 AM PDT. I've no idea if that provides any referrer info, but I'll probably look into logs at work to see how things have changed recently.

[link]

From: Dwight Gunning (Oct 13 2010, at 04:00)

Is the owner of the URL really owed that information or is just a common courtesy?

You could always go one of those backward website disclaimers that set out rules for how other sites are allowed to link to your site ;-)

[link]

From: Ciaran (Oct 13 2010, at 04:44)

Yes, it seems pretty likely that the bulk of missing referrers in this particular case come from twitter client desktop/mobile apps, but 'missing' referrers also result from email applications (you will have no referrer field for my visit, I clicked a link in Thunderbird), from direct entries of the URL, and from browsers with referrer sending disabled.

Interesting that you think it's a problem that should be fixed, and that the site is "owed" this information. Another perspective is that it's none of your business how I got to your site. You are also not receiving age/sex/location information for every visitor. You might like to. Visitors might not like to give it.

I'm not taking a particular side in the referrer argument - just pointing out that it exists. I don't go out of my way not to send referrer headers in my daily use, but I do think the question is equally valid as "why are people supplying this information ever?" rather than "why are their applications not supplying it".

Suppose the developer of client X implements the feature to send referrer headers (assuming the underlying OS/browser supports passing the information). Is this a feature that benefits the user of the application in any way whatsoever? If the developer makes the feature optional, and I'm pretty sure they'd have to, under what circumstances would the user want to switch it on?

[link]

From: François Beausoleil (Oct 13 2010, at 05:45)

According to http://en.wikipedia.org/wiki/HTTP_referrer:"When visiting a webpage, the referrer or referring page is the URL of the previous webpage from which a link was followed."

The browser did NOT follow a link: it received a URL on the command line (or equivalent). How can the browser infer a Referer from that?

[link]

From: Steve Strutt (Oct 13 2010, at 06:36)

Browsers are the one responsible for providing the referer information to the web server. If you come from somewhere that isn't a browser page, it is sent without a referer (and considered a "direct link")

I wish I could track visits from external apps and bookmarks, but there are some privacy concerns, too.

[link]

From: Paul Hoffman (Oct 13 2010, at 06:42)

I'm wondering if this problem might be coming from Choosy, the wonderful URL redirector on the Mac, and its ilk.

[link]

From: g (Oct 13 2010, at 11:11)

Kinda OT, but the thing that surprises me about your "Ten theses on tablets" post is that no one seems to have picked up on the obvious pun in the title.

[link]

From: Fabian Ritzmann (Oct 13 2010, at 11:33)

I am slightly baffled that nobody here seems to think that this is how it is supposed to work. First and foremost I am considering it an issue of privacy and data hygiene. On the technical side, if I am getting to this URL through some other means than clicking on a link in my browser, I certainly don't expect a Referrer URL to show in my request.

[link]

From: Oren Mazor (Oct 13 2010, at 11:39)

IIRC, twitter's a bit wonky itself on referrers. In fact, in certain views, it strips the referrer out and replaces it with just "twitter.com" instead of the full linke to the tweet that generated the click.

[link]

From: William Vambenepe (Oct 13 2010, at 14:49)

"I think whoever owns that URL is owed the information that they came from http://twitter.com/timbray"

I don't think they're owed this at all. Especially when the URL can be an intranet address. Or when it betrays not just what site you're coming from but what your userid on that site is. Or what search terms you used.

In fact, I have the http referrer disabled in my browser.

But I didn't read your "10 on tablets" piece 11,721 times and I doubt many others block that header. So I'm guessing that you're right and it's a mix of redirects (from URL shorteners) and Twitter client behavior.

[link]

From: Jeremy Gustie (Oct 13 2010, at 19:18)

As you said: the API probably doesn't offer a slot (it would be ripe for abuse). API limitations aside, I think "owed" is a little strong: especially since the spec suggests browsers should offer a way to disable the Referer [sic] header for privacy reasons.

You may also be seeing the effects of firewall software scrubbing outgoing HTTP requests (again, for privacy reasons).

[link]

From: Brian Whalley (Oct 13 2010, at 19:31)

Hi Tim - You and I discussed this briefly on Twitter today with Eric Anderson of IBM as well. I thought this was well known. This isn't an issue about Twitter doing something or not doing something, or URL shorteners, but just how HTTP works. If you click on a link that's on your desktop, there's no referring URL.

That should make sense - You didn't come from another website! It's the same as clicking on a link in a PDF or a Word document. There's no referrer because you didn't come from another website, and no website is there to give a referrer. This isn't new at all... Sites that are popular on Twitter often have HUGE "Direct Traffic" numbers in Google Analytics or other Analytics programs as a result, because that is naturally how you would tag a website visitor who does not have a referring URL. Some people get around this by hard-coding a utm_source= variable into their URLs before running them through a URL shortener, so that there is a manual "twitter" variable on the end of the URL. This helps them identify where their visits come from, but it can break down if someone takes that short URL and shares it elsewhere- Taking that URL and putting it on Facebook or their blog or something ruins the mechanism and skews all the results.

I'd love to talk about this more - Feel free to drop me a line @bwhalley on Twitter or bwhalley@hubspot.com (the marketing software company where I work).

Best,

Brian W

[link]

From: holaMau (Oct 13 2010, at 20:30)

just tweeted this: when adding links on Twitter, before you shorten it with your client of choice, add a srcRef=Twitter querystring at the end. For instance if you link to tbray.org/blah/blah make it tbray.org/blah/blah?srcRef=Twitter and then shorten it. You will not get a referrer link, but you will be able to see these hits on your Analytics. Granted, the issue here is whether it really came from Twitter or from someone emailing the shortened url to someone else...

But at least you know it originated from your tweet which ended up being visited by someone looking at your link on Twitter. no?

Makes sense? maybe not. =)

[link]

From: Jeroen Leenarts (Oct 13 2010, at 23:58)

Personally I think every browser should have an option like firefox's Network.http.sendRefererHeader. It's nobodies business to know where I linked from.

Anyone thinking they have a right to know where people are referred from should get their heads examined.

[link]

From: SomeRandomNerd (Oct 14 2010, at 00:49)

Lets say for the sake of argument that there should be some information there, and URL shorteners were to preserve it. What should it be? If I'm looking at my Twitter stream, should it carry my name? The name of the Twitterer who posted the link? What if I got there from a retweet? (Either the old-fashioned type or the new style.) Should the person who brought it to my attention get the credit, or the person who brought it to their attention?

Never mind the privacy issues- if I want to link to something I don't like or agree with, should I have to announce my existence and identity?

I think they are interesting questions that it raises. I don't have answers for any of them though.

[link]

From: Simon Boyle (Oct 14 2010, at 00:59)

Including the profile link in the referrer is considered to be a leak of personally identifying information. It doesn't reveal who posted the link, it would reveal who read the link, and that crosses the privacy line for most people.

http://www.benedelman.org/news/052010-1.html

[link]

From: Matthew Frederick (Oct 14 2010, at 02:05)

Just a note that no referrer is sent when a site is visited from a bookmark, either; it's as though the url was typed manually.

[link]

From: Attila Szegedi (Oct 14 2010, at 05:24)

People saying Referer header shouldn't be filled out if the link comes from a non-browser app are mistaken. HTTP specification uses a more loose term "user agent", not "browser". In this regard at least, Wikipedia too is wrong. Any desktop app that does a HTTP request on user's behalf is an user agent. If it's aware of the URL of the resource from which the link was acquired, it should set the Referer header, period.

Since nowadays we have URL shorteners that use a redirect, I'm starting to think user agents should start using multiple referers on redirect. I don't see anything in the HTTP spec that'd prohibit having multiple referers (similar to how, say, you can have multiple Via headers).

I.e. my tweet <http://twitter.com/#!/asz/status/27252109843>has a bit.ly link <http://bit.ly/9PSwOA> which is resolved to <http://www.telegraph.co.uk/news/worldnews/southamerica/chile/7978509/Mistresses-and-wives-clash-over-trapped-Chilean-miners.html>

When the user agent makes the final request to telegraph.co.uk, it could send two Referer headers:

Referer: http://twitter.com/#!/asz/status/27252109843

Referer: http://bit.ly/9PSwOA

(original referer first). This, however is up to the user agents (browsers and so on) to implement.

[link]

From: Dan Thies (Oct 14 2010, at 13:13)

Tim,

The client can't send an HTTP referrer to the browser. The browser doesn't have one, so it can't send one to you. That's the end of that story.

You can tag your URLs with variables to track with your analytics solution (e.g. Google Analytics) but you won't be able to attribute every click to a referring source, ever.

[link]

From: alek (Oct 14 2010, at 18:26)

While most of your traffic *is* probably Twitter related (and yea, almost always (but not 100%) has no referrer), another source of "where is it coming from" traffic is via Email forwarding.

For example, I have some nifty Hummingbird pictures ( http://www.komar.org/faq/travel/hummingbirds/nest/ ) and it just continues to get a constant rumble of traffic, most with no referrers. However, I can "infer the referrer" (!) as I'll occasionally see a mail.live.com, mail.yahoo.com, etc. show up ... although appears to have "random bits" associated with no identifyable info. BTW, gmail.com does *not* show up.

So presumably what is happening is people look at, and then forward to other people who then ... rinse and repeat! ;-)

alek

P.S. Facebook is similar to twitter that there is limited info to guess who is linking to 'ya.

[link]

From: Adam Nohejl (Oct 15 2010, at 09:05)

You shouldn't take the Referer field information for granted. Read RFC 2616 for what it is meant for (exclamation points mine):

'The Referer[sic] request-header field allows(!) the client to specify, for the server's benefit, the address (URI) of the resource from which the Request-URI was obtained (the "referrer", although the header field is misspelled.) (...) The Referer field MUST NOT(!) be sent if the Request-URI was obtained from a source that does not have its own URI, such as input from the user keyboard.'

'Because the source of a link might be private information or might reveal an otherwise private information source, it is strongly recommended(!) that the user be able to select whether or not the Referer field is sent.'

By the way, I use iCab, which allows me to send referrals only within the same domain, which I think is a good middle way.

[link]

From: Dov Murik (Oct 16 2010, at 11:50)

Let me suggest another theory. Browsers do not add an HTTP Referer header if the link is going from an HTTPS page to an HTTP page. Twitter.com is now a secure HTTPS page (at least for me), and when I follow a link to www.tbray.com my browser won't add a Referer header.

Most blogs and news sites are not HTTPS, so you won't have this problem there.

Does this help?

[link]

From: IBBoard (Oct 18 2010, at 01:17)

@Attila: The spec may say "user agent", but as some of us have pointed out, most Twitter clients are throwing the URL to the OS or a browser to say "I don't know what to do with it - you show it". In that situation then there is a disconnect and so the user agent for the XML data feed (which happens to be fetched over HTTP, but is separate from the website) is different to the user agent for the subsequent web page view. In that situation then it is reasonable not to consider "the user agent" to be capable of showing a referrer, as the browser gets a new request from nowhere.

As for your idea of multiple referrers - good luck trying to work a reasonable structure for that in log files! If you want to put a shortener in the middle of your link then I think you should a) use the stats it provides, b) cope with having lost the referrers you had or c) not use URL shorteners and just use shorter URLs. Also, I've got a sensible extension that expands all of those shortened links, so what I saw didn't make as much sense:

"I.e. my tweet <http://twitter.com/#!/asz/status/27252109843>has a bit.ly link <http://www.telegraph.co.uk/news/worldnews/southamerica/chile/7978509/Mistresses-and-wives-clash-over-trapped-Chilean-miners.html> which is resolved to <http://www.telegraph.co.uk/news/worldnews/southamerica/chile/7978509/Mistresses-and-wives-clash-over-trapped-Chilean-miners.html>"

WRT @Oren's comment about just showing "twitter.com", it looks like an artifact of the new Twitter layout. They're abusing URLs a little and everything is now an anchor (see Attila's Tweet above). Because of that then all URLs will look like they come from the front page of Twitter.

WRT @holaMau's suggestion, I think a lot of URLs are already starting to do that. The amount of junk I see in some URLs when they're expanded from their shortened form is crazy. The short ones append "utm_source=twitterfeed&utm_medium=twitter" and the long ones add another couple of "utm" parameters. One of these days I'll get round to writing a Chrome extension to trim that extra cruft.

[link]

What this is ·

Subscribe to ongoing

Truth · Biz · Tech

author · Dad
colophon · rights

picture of the day

October 12, 2010
· Technology (90 fragments)
· · Publishing (161 more)
· · Web (397 more)

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.

I’m on Mastodon!