What Do Tags Mean?

I’m almost convinced that this new Technorati Tags thing is important, but I’m 100% convinced that I don’t understand where it’s going or what the implications are. Which is OK, because I suspect nobody else does either.

It took only a few minutes’ hacking to add them to the RSS feed here at ongoing (sigh, my Perl code, what with this and rel="nofollow", is up over 2,000 lines). So, for some period of time starting a few minutes or hours after I publish this, and probably lasting for a couple of weeks, this page should appear here, and for a much shorter period of time here.

How-To · There’s an embryonic publishers’s how-to, which tells you about putting the following in your pages:

<link rel="tag" href="http://technorati.com/tag/foo"/>

It turns out you can also just put <dc:subject>foo</dc:subject> in your feed (assuming you’ve declared the dc: prefix properly) which is shorter and (much & all as I love Technorati) arguably more open.

Except for, as a Web-Architectural kinda guy, I like the idea of identifying categories with URIs. Except for, the whole point of that is that everyone can have their own, and Technorati’s setup only recognizes the ones beginning http://technorati.com/. Except for, the more the tags gets shared, the more useful they are. Hmm, more thought required. Dave Sifry, who launched this whole episode, has published his thoughts here.

Feed or Page? · It’s interesting that if you use the <link> technique your tags are in your page, but if you use <dc:subject>, they’re in your feed. Given the fact that a lot of pages are still tag-soup, while the vast majority of feeds are reasonably well-formed, the feed feels like a more natural home for this kind of metadata; but it’s nice that both options are available.

On Metadata · I’ve spent a lot of time thinking about metadata and have written on the subject; the most important conclusion was: There is no cheap metadata. I haven’t seen anything to make me change my mind.

For a different, and very fresh, spin on the same proposition, see Clay Shirky’s folksonomies + controlled vocabularies. He takes on the question of whether “folksonomies”, such as what may emerge around Technorati Tags, are good enough. It’s worth reading all of it, but this in particular seems to the point:

This is something the ‘well-designed metadata’ crowd has never understood — just because it’s better to have well-designed metadata along one axis does not mean that it is better along all axes, and the axis of cost, in particular, will trump any other advantage as it grows larger.

Having said that, and granting the proposition that The Simplest Thing That Could Possibly Work usually wins, I still have to say that the Technorati Tags all being in a single flat namespace does seem a little, well, brittle.

On Atom · Consider the Atom category element. Here’s an example:

<category term="Vancouver" />

Here’s another:

<category scheme="http://www.tbray.org/ongoing/What/" 
          term="Arts/Music/Performance">
Arts 
· Music 
· · Performance
</category>

You see, Atom lets you express tags just as simple as the the current Technorati style. But if you want to say “here’s a tag (a string), here’s the vocabulary it comes from (a URI), and here’s a human-readable version you can display”, Atom can do that too.

I think that it would be nice if a huge number of web pages converged on using a simple, flat, shared set of tags with entries like vancouver and mac os x and tsunami relief, which the current setup works well for.

But I think it would also be nice if, once we have Atom, there are feeds about Petroleum Geology with their own tags, and feeds about Military Training too, and they each have their own drill tag. Which Atom would support nicely.

Of course, the only people who would need to know about the Petroleum or Military tags would be people specifically looking for that kind of stuff; someone looking for a drill tag generically would probably get both and maybe that would be fine.

Bottom line: I suspect Technorati, and anyone else who takes this up, should offer an (optional) “scheme” field in their tag search capability, which would be handy for those who care and invisible for those who don’t.

On Mechanics · At ongoing, here’s what I’m doing: for this fragment, which is in Technology/Syndication and Technology/Metadata, the RSS feed will contain:

<dc:subject>Technology/Syndication</dc:subject>
<dc:subject>Technology</dc:subject>
<dc:subject>Syndication</dc:subject>
<dc:subject>Technology/Metadata</dc:subject>
<dc:subject>Metadata</dc:subject>

I’d be surprised if Technorati is going to do anything very smart with the hierarchy, but I don’t see any reason to lose information while generating this stuff.

After I put the tags in my feed, I immediately started banging on Technorati and was having trouble finding them there. One reason was obvious: all the back items in my feed hadn’t changed even though they now had <dc:subject> tags, and Technorati’s change-detection algorithms detected that and refused to re-index. Fair enough.

Then, some still weren’t showing up, largely because they were sorted by date not authority; anything older than a few days, in a popular tag, just isn’t going to be there.

I’m pretty sure this is a bug: if I do a tag search for “Microsoft”, and if Robert Scoble or Jonathan Schwartz have posted on the subject any time in the last 48 hours, I want those at the top of the damn list, not what some college kid wrote three hours ago.

Here are a few notes on how Technorati’s doing things right now. First of all, they’re monocasing tags. Second, they’re OK with spaces, for example here is a tag search for “Mac OS X”. Third, they treat my full pathnames as standalone tags, as in Technology/Metadata.

The Questions · What I’m left with, mostly, are questions: Should I be using my own category tree, or should I switch to tags that Technorati already knows about? Should I introduce Tim Bray and ongoing tags, presumably on all entries? Should categories be in trees? Should their names be monocased (with potential for i18n breakage)? Are categories actually going to produce better searches than good old full-text? Here are tag and full-text searches for “Vancouver”; what do you think? What am I doing categories for? What is anybody doing categories for? What is everybody doing categories for?

ongoing

What this is ·

Truth · Biz · Tech

author · Dad
colophon · rights

January 18, 2005
· Technology (90 fragments)
· · Metadata (1 more)
· · Syndication (67 more)

By Tim Bray.

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.

I’m on Mastodon!