Google Memory Loss

[This fragment is available in an audio version.]

[Update: As of Jan 1, 2023 this is fixed! Thanks to Danny Sullivan and John Mueller of Google for figuring out what was going on. Yay!]
I think Google has stopped indexing the older parts of the Web. I think I can prove it. Google’s competition is doing better. [Update, Feb. 2022: It’s still happening.]

Evidence · This isn’t just a proof, it’s a rock-n-roll proof. Back in 2006, I published a review of Lou Reed’s Rock n Roll Animal album. Back in 2008, Brent Simmons published That New Sound, about The Clash’s London Calling. Here’s a challenge: Can you find either of these with Google? Even if you read them first and can carefully conjure up exact-match strings, and then use the “site:” prefix? I can’t.

[Update: Now you can, because this piece went a little viral. But you sure couldn’t earlier in the day.]

Update: February 2022 · Here’s the smoking pistol. Go back to Feb. 1, 2015, when there was only one article, with a two-word title. Try to find it! Most search engines accept the following syntax: stifado site:tbray.org (pardon the lack of a direct pointer because any present-day discussion with direct links to the article will cause the engines to re-index it.) This time, Bing also can’t find it either. But DuckDuckGo can!

Why? · Obviously, indexing the whole Web is crushingly expensive, and getting more so every day. Things like 10+-year-old music reviews that are never updated, no longer accept comments, are lightly if at all linked-to outside their own site, and rarely if ever visited… well, let’s face it, Google’s not going to be selling many ads next to search results that turn them up. So from a business point of view, it’s hard to make a case for Google indexing everything, no matter how old and how obscure.

My pain here is purely personal; I freely confess that I’d been using Google’s global infrastructure as my own personal search index for my own personal publications. But the pain is real; I frequently mine my own history to re-use, for example in constructing the current #SongOfTheDay series.

Competition · Bing can find it! DuckDuckGo can too! Both of them can find Brent’s London Calling piece, too.

What Google cares about · It cares about giving you great answers to the questions that matter to you right now. And I find that if I type in a question, even something complicated and obscure, Google often surprises me with a timely, accurate answer. They’ve never claimed to index every word on every page.

My mental model of the Web is as a permanent, long-lived store of humanity’s intellectual heritage. For this to be useful, it needs to be indexed, just like a library. Google apparently doesn’t share that view.

What I’m going to do · When I have a question I want answered, I’ll probably still go to Google. When I want to find a specific Web page and I think I know some of the words it contains, I won’t any more, I’ll pick Bing or DuckDuckGo.

Contributions

Comment feed for ongoing:

From: Raul (Jan 15 2018, at 15:25)

Have you tried ecosia.org? They are powered by bing and you also plant trees while you search!