For a few years now, the Internet has been just insanely useful. Everything is there and you can find it when you need it. But Google is working less and less well, and I’ve spotted another potential crack in its foundations. Will we look back on this as the time when it all started to fall apart?

Farming · This has been written up recently by Paul Kedrosky, with follow-up from John Battelle, and ReadWriteWeb has been doing a terrific job tracking the Content-farm problem.

By the way, they’re on Twitter too; I have a canned search (via the lovely Mac Tweetie client) for a combination of “EU” and “Oracle” (guess why). This afternoon, the flag came up and an innocent-looking tweet took me to something called “redhotnews.com” that gets no links from me, where there was one paragraph from this Business Week story surrounded by seventeen (!) different AdSense ads.

Adsense sleaze

Hey Google, that’s your brand all over this pool of sleaze.

The bottom line: There are a huge number of smart sleazy people thinking all day every day about how to game the Google ecosystem; content farms are just one of the newer manifestations. Google knows about this, obviously, and they’ve known about things like gaming sleazy affiliate programs for years; but the problem remains. At this point, it’s hardly relevant whether Google’s huge screaming conflict of interest is relevant here; it’s perfectly possible they’re making an honest effort but just losing.

My Own Problem · I’m having a very personal problem with Google; there are things on my blog, the publication you are now reading, that I can’t find. There will be times that I need to refer to some story that I just know is out there somewhere back in the last six ongoing years, but I can’t find it. Google still indexes every word here, near as I can tell, so if I know the exact phrase I’m looking for, I’ll do fine.

But sometime over the last few years, my ability to use Google to find things here has taken a qualitative turn for the worse. I know maybe too much about search technology; I’m assuming it’s just search-quality algorithms being tuned away from inclusiveness and toward fussiness, in the face of assault from all the bad people on the Internet.

Whatever; the effect is not subtle.

It’s the People, Stupid · I really hope that Google (or Microsoft, or Yahoo, or whoever) figures out which set of knobs on the sides of the big search engines to twist to get us back to where things were when they worked better. But until that happens, I can see one way forward that I don’t think can be gamed; we’ll just have to rely increasingly on people and their reputations.

If I can’t trust searching, retrospective or real-time, to let me know what’s going on in management APIs, I don’t care, because if I follow William Vambenepe, he’ll have the poop. If I want keen-eyed level-headed oversight on what’s up in games, the crew at Ars Technica are trustworthy. If I want partisan but fair and thorough coverage of US politics, Josh Marshall and the crew at Talking Points Memo. For tracking Vancouver civic issues, Frances Bula. And so on.

Yeah, it’d be nice if there were a way to mechanize away the need for trust, and for reliance on the small number of people you can scale up and listen to. But maybe the period of history during which that was possible was an anomaly which is drawing to an end.

The Internet is still really good at helping you find the right people to pay attention to, and scaling up to listen efficiently. Which is immensely better than nothing.



Contributions

Comment feed for ongoing:Comments feed

From: Lou Springer (Dec 14 2009, at 18:54)

Expanding the question a bit beyond search, to the content/editing/channel triad of publishing, there is still no Internet version of editing that just magically works. There are tons upon tons on content, the channels are cheap and plentiful, but what is the thing that "naturally edits" as part of how we effortlessly use the Internet?

The secret is likely somewhere in social networking and the notion of 'following'; I'm more likely to like what my 'friends' like, qualifying and sorting friends in overlapping groups depending on the interest, but the power of these relationships aren't quite fully leveraged yet.

[link]

From: monowerker (Dec 14 2009, at 21:20)

If you know something is on your blog and you know Google indexes all of it. Doesn't "site:tbray.org +searchterm" work for you?

[link]

From: Gabriel K. (Dec 14 2009, at 21:33)

About the "it's the people stupid" part. That's why i use twitter more and more. I know that - let's say - 6 or 7 guys are worth following.

The point is to open the channel or research. It's quite hard to find good sources, sources that make you discover new things to read, new sources of good articles etc. If you know some people you trust, how to discover new ones?

[link]

From: william (Dec 14 2009, at 21:53)

I think that we are at the end point for a resonalble result set from the current set of search engines. Billions of dollars spent in a run towards the ghost that is A.I. and this is where we are at. If google with it brilliant engineers and endless supply of cash gives us the current set of results; it many be time for a change in the way that both content input and output are carried out on the internet.

[link]

From: John Simon (Dec 14 2009, at 22:46)

re mechanical trust: how about old FOAF-tags?

[link]

From: Chris Mahan (Dec 14 2009, at 23:48)

I've been living in Northrigde, CA, for 17 years (came just in time not to miss the 1994 quake) and still, still, the internet is useless for finding things there. Even google maps with local stuff enabled only shows only about 40% of businesses, and a lot in the wrong spots.

I personally think the 1995 web was more reliable. (except for geocities and tripod) because it took effort to put stuff out there, so only serious people did it.

Gary Vaynerchuk said in one of his videos that the monetizers on the web will be like music DJs: they will be human filters, gathering the good stuff for the consumption of their tribe, and monetizing their ad space directly with vendors. I agree completely with that.

For example: stackoverflow.com search sucks. It's got all kinds of regex kinks. Yet it's infinitely better than google's site:stackoverflow.com search, and they don't have to show those ads.

I heard it said that search was a done thing. I tend to disagree. In 1998 there used to be a ton of search engines, with meta engines that collected the aggregate data. There were also human-powered web indexes (dmoz) but by trying to index everything well, they indexed nothing well.

Something is going to happen, I think.

[link]

From: Alain Geenrits (Dec 15 2009, at 02:31)

Interestingly the conclusion to trust people more came up in a conf call with Hal Stern recently for the SEED program. I asked him about the evolution of search and how people have more and more difficulty to find stuff.

His answer was that we are moving more and more to communities on the web, where you surf amongst like-minded or trusted people.

[link]

From: Ciaran (Dec 15 2009, at 05:05)

I'm not sure if it would be nice to mechanise away trust. The "It's the people model" suits me better anyway, that's how I came to be reading this post.

I know that sometimes you have worthwhile things to say about stuff I have a professional interest in, and other times it's a photo of a flower, or an in-depth description of how you like to make toast. Also interesting, to me at least, because a) it just is, and b) it exposes a real person, which is something that can be trusted, unlike (in my opinion) a brand.

Following on from that, sometimes when you write about something you link to someone else who's written about it, exposing me to another potential source of interesting things.

The same points apply to microblogging too, whether proprietary or otherwise. ;)

So news, views, opinions, humour, toast recipes, can all come to me direct from real people - no need for corporate middlemen with axes to grind, visitor numbers to increase, shareholders to satisfy, profits to make, etc.

As a result, 99% of my searches these days are for specific technical information, e.g. when I have a particular error message that's not covered in the documentation, or when I need some documentation I don't have.

To summarise, it's more like the beginning of a golden age if you ask me.

As for finding things on your own blog - your server, your data, something must be wrong if Google has to come into that transaction.

[link]

From: Zellyn Hunter (Dec 15 2009, at 05:14)

I've had the same experience with your blog. Googled “tim bray [topic]” for an article I *knew* should come back and found nothing. Eventually found it by browsing your archives/categories. It surprised me too.

[link]

From: Lina Inverse (Dec 15 2009, at 06:20)

I have a friend who works at Wowd where they're doing an interesting thing: you run a Java app on your system and it anonymously sends the pages you frequent into the Wowd cloud, which is all the systems running that app.

Wowd searches use a little bit of the cloud's resources and your search results are heavily weighted towards what other people are actually doing.

Last time I played with it (right after launch) there weren't enough people in the cloud to make it more than a curiosity for all but the most common "hot" stuff, but if it can or has gotten beyond that stage it could be quite interesting.

Obviously it will be gamed if it gets to be really big, but fighting that would be a rather different sort of whack-a-mole, one where the hammer wielder has a lot more data.

[link]

From: Bilgehan (Dec 15 2009, at 10:21)

Efficiency of search results is worse for queries of non-english content. Most of the time sites that make every mistake that google warns about are displayed in the first results page. There seems to be a problem with web spam algorithms.

[link]

From: len (Dec 16 2009, at 07:43)

Following the crowd means content is mostly the same week after week.

The same is so for searches.

The web becomes self-reinforced reality TV, a series set in a comic book store.

[link]

From: Jeff Dickey (Jan 04 2010, at 04:58)

@Len - Agreed. More specifically, I think the comic shop is situated in Stalingrad in roughly December, 1942. Not a pleasant experience.

I think a few people have touched on the hidden-but-real problem here; we've used a lot of very book-smart people (does Google hire *anybody* without at least a MSc from a "top school"[whatever that is]?) The space has become such an echo chamber, with "the ghost that is A.I." far more fully formed in the developers' theology than in their technology. The simple things have been made astoundingly complex, and the genuinely complex has a "new! simple interface!" marketing paint-job splattered all over it. Those who are starting to call Google the new Microsoft have a point - the product (search result placement) is marketed to the skies, while the poor user who just wants to find something he *knows* is on the site, is lost.

I happen to think the "solution" we're stumbling towards is part walled garden, part explicit markup. It will be interesting to see if that worst-of-both-worlds scenario can possibly deliver less satisfaction - at any point around the table - than obtains today.

[link]

From: Grokodile (Jan 05 2010, at 21:12)

I think you are exactly right when you talk about trust as a replacement.

We are at the point that anything successful, hence in front of a lot of people, turns into a game for affiliate pennies.

The problem is figuring out how to incorporate user generated trust opinions in a reliable way.

Perhaps large well edited sites will become "promoters" of content if they aren't already? But then, what happens to good content that doesn't have room to fit on the edited sites?

The explosion of both content and crap is going to have some interesting consequences...

[link]

author · Dad · software · colophon · rights

December 14, 2009
· The World (115 fragments)
· · Life Online (267 more)
· Technology (85 fragments)
· · Search (66 more)

By

I am an employee of Amazon.com, but the opinions expressed here are my own, and no other party necessarily agrees with them.

A full disclosure of my professional interests is on the author page.