[This fragment is available in an audio version.]

[Update: As of Jan 1, 2023 this is fixed! Thanks to Danny Sullivan and John Mueller of Google for figuring out what was going on. Yay!]
I think Google has stopped indexing the older parts of the Web. I think I can prove it. Google’s competition is doing better. [Update, Feb. 2022: It’s still happening.]

Evidence · This isn’t just a proof, it’s a rock-n-roll proof. Back in 2006, I published a review of Lou Reed’s Rock n Roll Animal album. Back in 2008, Brent Simmons published That New Sound, about The Clash’s London Calling. Here’s a challenge: Can you find either of these with Google? Even if you read them first and can carefully conjure up exact-match strings, and then use the “site:” prefix? I can’t.

[Update: Now you can, because this piece went a little viral. But you sure couldn’t earlier in the day.]

Update: February 2022 · Here’s the smoking pistol. Go back to Feb. 1, 2015, when there was only one article, with a two-word title. Try to find it! Most search engines accept the following syntax: stifado site:tbray.org (pardon the lack of a direct pointer because any present-day discussion with direct links to the article will cause the engines to re-index it.) This time, Bing also can’t find it either. But DuckDuckGo can!

Why? · Obviously, indexing the whole Web is crushingly expensive, and getting more so every day. Things like 10+-year-old music reviews that are never updated, no longer accept comments, are lightly if at all linked-to outside their own site, and rarely if ever visited… well, let’s face it, Google’s not going to be selling many ads next to search results that turn them up. So from a business point of view, it’s hard to make a case for Google indexing everything, no matter how old and how obscure.

My pain here is purely personal; I freely confess that I’d been using Google’s global infrastructure as my own personal search index for my own personal publications. But the pain is real; I frequently mine my own history to re-use, for example in constructing the current #SongOfTheDay series.

Competition · Bing can find it! DuckDuckGo can too! Both of them can find Brent’s London Calling piece, too.

What Google cares about · It cares about giving you great answers to the questions that matter to you right now. And I find that if I type in a question, even something complicated and obscure, Google often surprises me with a timely, accurate answer. They’ve never claimed to index every word on every page.

My mental model of the Web is as a permanent, long-lived store of humanity’s intellectual heritage. For this to be useful, it needs to be indexed, just like a library. Google apparently doesn’t share that view.

What I’m going to do · When I have a question I want answered, I’ll probably still go to Google. When I want to find a specific Web page and I think I know some of the words it contains, I won’t any more, I’ll pick Bing or DuckDuckGo.



Contributions

Comment feed for ongoing:Comments feed

From: Raul (Jan 15 2018, at 15:25)

Have you tried ecosia.org? They are powered by bing and you also plant trees while you search!

[link]

From: Moe (Jan 15 2018, at 15:56)

> What Google cares about

Is showing you ads as much as possible. It drops any data, service, technology which isn't helping to show this.

[link]

From: A friend (Jan 15 2018, at 16:16)

It can easily find the http://inessential.com/2008/11/04/that_new_sound article with an exact phrase search.

Sometimes, portions of Google's index are momentarily unavailable for technical reasons... Maybe you stumbled upon that?

[link]

From: Arnold deVos (Jan 15 2018, at 17:10)

I think you may have contaminated the evidence by linking to the review in question from this peice.

[link]

From: M. Douglas (Jan 15 2018, at 17:22)

A. Friend: Arthur, is that you? Are you working for "them" now??

[link]

From: John Cowan (Jan 15 2018, at 17:28)

As I think we both know but other people might not, Google indexes different pages with different frequencies. In particular, ****google, the engine for pages that haven't changed in a long time, is run very slowly, consulting only a few of the 90% or so of all pages that it is responsible for. And on a query, the ****google index isn't even consulted if it's clear that other engines will easily fill up the maximum of 1000 returnable pages for any search. (****google itself may have been replaced by now, but I suspect whatever replaced it is run on the same basis.)

So use precise searches with quoted words (quoting a single word excludes pages that seem relevant but don't actually contain that word) or phrases, so that you aren't drowned by non-****google responses that aren't what you want.

Note that this has nothing to do with the SEO wars, which are about the *order* in which those 1000 possible responses are displayed.

[link]

From: Twirrim (Jan 15 2018, at 18:44)

When I switched jobs nearly 2 years ago, I decided to make a switch to Duck Duck Go as default on my work's machine, just to see how things went.

I found that I miss just one thing from Google: Maps. That's only a simple URL away, so not a big deal. DDG does as good, if not better, a job of finding me the results I need across a wide variety of technical and personal subjects.

[link]

From: Jim White (Jan 15 2018, at 18:59)

For [Rock n Roll An­i­mal] I see your 2006 review as the 102nd (2nd entry of page 10 of search results). For [Rock n Roll An­i­mal tim bray] and [Rock n Roll An­i­mal site:www.tbray.org] it is the first entry (with this post as #2). How is this broken?

For [Lon­don Calling site:inessential.com] I get "inessential: Nov 2008" inessential.com/2008/11/) as the third result which has Brent's November posts including That New Sound.

[link]

From: Mister coffee (Jan 15 2018, at 18:59)

I’ve seen the same! I was using site: searches as a similar crutch in place of lackluster search on a large site I used to work for and have had zero luck finding pieces I knew to be there. The ones that went viral(ish) or got significant links were findable but the rest sure weren’t! I didn’t even think to try duckduckgo.

I thought google was trying to organize ALL the world’s information?

[link]

From: David Carlton (Jan 15 2018, at 19:30)

Huh. I noticed the exact same thing on my blog, but I blamed it on my end - I'd assumed my server was down or overly slow during recent times their crawler had tried to traverse it, or something. Guess not.

[link]

From: Dave (Jan 15 2018, at 20:23)

I call it "the edge of google".. and you just fell off, although I based that originally on the opposite: a lot of information that has not or can not be indexed will never end up in Google.

[link]

From: Jon (Jan 15 2018, at 20:34)

Wow. Two of the most esoteric, and greatest albums ever made.

We were going to see The Clash at the day on the green so we wanted to hear some of their music, and went out and got London Calling. Wow! Completely blown away. Performance/production. Thanks for bringing this magnificent album back to light.

The same goes for Lou Reed. Thanks

[link]

From: jay (Jan 15 2018, at 20:54)

I remember (1995-96) when you would pay yahoo a $199 and wait a couple of months to index your new web site or wait a year if you didnt pay. If you didn't submit your site to yahoo, it was just not indexed, NO ONE could find your website. Google does a great job on this, but yes, they don't get everything instantly. But FREE? You have no idea how great this is.

[link]

From: Geoff (Jan 15 2018, at 21:59)

A google search for

tim bray sotd missionary man

Doesn't seem to return the SotD page for "Missionary Man"

At duckduckgo it's the first result.

[link]

From: AlanL (Jan 16 2018, at 02:10)

I switched from google to bing on my phone a year or so ago because of AMP URLs.

I certainly have noticed any deterioration in search quality and I feel no inclination to go back.

[link]

From: Traduttore (Jan 16 2018, at 05:16)

Results are biased by their business plan, they should tell us how they filter results.

[link]

From: Tony Hirst (Jan 16 2018, at 05:52)

I've noticed something similar - and started wondering about it in terms of "digital dementia"... https://blog.ouseful.info/2017/10/22/digital-dementia-are-google-search-and-the-web-getting-alzheimers/

[link]

From: Tomas B (Jan 16 2018, at 06:53)

I'm torn here: Google only owes it to their shareholders to organize knowledge that matters (that can be supported by ads) but their messaging and posturing has lead us to expect more of them. This is why we need to support Archive.org and other amazing efforts.

Lastly: Exactly album choices!!

[link]

From: Nick Sweeney (Jan 16 2018, at 08:12)

I don't like the phrase "digital dementia" because dementia is an organic deterioration, not a technical one, but the way it manifests itself is reminiscent of cognitive decline.

I've noticed it for a couple of years: the capacity to bring back an old page on a distinctive phrase and a topic has gone, replaced by a desire to show you something vaguely related to your query that was published a couple of hours ago. The notion of the web as archive, as opposed to an ever-changing surface presence, is fading away from from search.

Maybe Google needs to create a separate product for "web-as-archive", even if there's less ad money in it? Until that happens, I'm aggressively bookmarking and tagging those one-off pages whenever I retrieve them, setting up a personal outboard search engine because I don't have confidence that I'll be able to find them again through the big providers. (Bing and DuckDuckGo are often not much better for the Old Web.)

[link]

From: John Harrelson (Jan 16 2018, at 08:56)

I discovered DuckDuckGo.com several years ago and love it..

it also has something some other search engines don't have.. "PRIVACY" when searching..

DuckDuckGo blocks sites from tracking your search.

[link]

From: Ryan (Jan 16 2018, at 11:10)

Thanks for reporting this! We're trying to figure out what happened to it (a little delayed due to MLK day). This is an url we would expect to have indexed, so it's very helpful to have examples like this where the systems aren't working as expected.

[link]

From: stephen (Jan 16 2018, at 12:18)

i've remedied this (on wp- sites) by adding amp, an html sitemap if there isnt one, and tweet the sitemap or something... however you go about initiating the bots

[link]

From: Andrew Reilly (Jan 16 2018, at 21:47)

In reply to John Cowan: I don't think that Google's quoted queries work as well as they used to. I can't remember details, but I do remember being disappointed the last couple of times that I tried the obvious quoted search for an error message, and being shown several pages of results that not only didn't contain the quoted phrase, but didn't contain many of the words from the quoted phrase in sequence.

Not sure that I'm brave enough to try other search engines, yet, but maybe one day.

I can't even really bring myself to blame Google, because I understand that a system continuously optimized to give the best response to the most people is going to gradually give worse responses to me. It was a much better match when the internet was mostly a technical, university thing. I am not particularly interested in what "most people" seem to be interested in, and vice-versa.

It seems that other sorts of web index, or user-selectable rank algorithm are needed. Not that I'm asking for Alta Vista back.

I do see it as proof that the much-vaunted user profiling to improve search results (all of which I leave enabled) is rubbish.

[link]

From: M. Fioretti (Jan 17 2018, at 01:22)

I found evidence of the same, or very similar problem, with a 2006 post of mine, albeit in my case it may be partly my fault. Details here: http://stop.zona-m.net/2018/01/indeed-it-seems-that-google-is-forgetting-the-old-web/

[link]

From: SCan (Jan 17 2018, at 04:34)

Unfortunately your conclusion is one most of us will share - google service is degraded but it’s still the best option so we continue to use it. That’s the right move for each individual but collectively it’s encourgaging bad (or at least disappointing) behavior.

Admittedly, I’ll do the same.

[link]

From: Phil (Jan 17 2018, at 04:42)

I've had the same problems with google not finding things. I recently tried to look up a company name and google said it didn't exist. A search for the name on bing ended with the companies web site at the top of the list. I seen this several times. I now check the results from google if I don't see what I want.

[link]

From: Keith (Jan 17 2018, at 08:34)

I just use Bing all the time simply because Google is a near-monopoly. If Bing was a near-monopoly, I'd use Google. Bing works just fine. Monopolies are bad for society.

[link]

From: Carlos Osuna (Jan 17 2018, at 09:36)

Tim.

Actually Google doesn't keep any Indexes "alive" as you will.

The whole MapReduce thing-y is based on linkage and it's a very complex matter.

You post became available precisely because people "undusted them" after you post, that is, people stared searching for them and linking to them.

Other sites sometimes feed on older technology so, they refresh older sites after Google has long forgotten them.

So, this is not something "they forgot" but something meant "by design".

[link]

From: Doug K (Jan 17 2018, at 10:56)

thank you, most interesting. I ran into this today, looking for a specific post I'd made on a flyfishing site - even using the site: keyword with a unique phrase (only one mention of root canals on this site). G didn't find it. I am so reliant on G that I did not even think of trying other search engines. Luckily I'd saved a copy of the post with URL in my personal indexed trash heap, so found it that way.

Andrew Reilly, this is terrifying..

"I do remember being disappointed the last couple of times that I tried the obvious quoted search for an error message, and being shown several pages of results that not only didn't contain the quoted phrase, but didn't contain many of the words from the quoted phrase in sequence."

If this stops working my job becomes much more difficult. Yike.

It always seemed a sort of miracle to me that Google search worked so well even for the most trivial things on the web - my blog, etc.

Are we reaching the end of the age of miracles and wonders ?

[link]

From: Yusuf (Jan 17 2018, at 20:35)

Totally agree. Google seems to have lost it. The search results are just SEO and news crap most of the time. Pagerank is fundamentally broken and a bad algorithm for an internet of this age. It made sense for content where citation network mattered. Now it is simply a playground of marketeers and traffic seekers.

[link]

From: James (Jan 18 2018, at 05:55)

>What Google cares about

not this article ;)

[link]

From: Mollyfud (Jan 18 2018, at 18:20)

I am not sure this is evidence that Google is doing anything malicious! Do you have a sitemap to help google index your site? If not, it can only follow links and if there were no links to the article, how would you expect it to find it?

Interestingly, since you freshly linked to it, the article has returned to Google. I think this is more likely to prove that Google's index is refresher and so doesn't keep stuff from years ago that is no longer linked too unless you have a sitemap to tell Google what you want indexed. I notice you have a witty robots.txt file so if you don't already, might be time for a sitemap file as well! ;-)

[link]

From: The New Man Gifts (Jan 19 2018, at 06:38)

Now that the page is indexed I can't check this, but next time do an info: search on the URL (after you've found it via Bing).

You will see if it's indexed in Google.

[link]

From: Glen Wood (Jan 19 2018, at 11:42)

2006 is *old* web? That makes me feel old. What chance is there of finding anything from 1996?

[link]

From: Stavros Macrakis (Jan 21 2018, at 08:58)

It is unlikely that Google is not returning old material because indexing it is "crushingly expensive". Google happily provisions servers for enormous corpora like YouTube, gmail, web images, personal photos, etc., which are probably *less* valuable. What's more, given the growth of the Web, *old* pages are surely a tiny proportion of the total.

We also don't know whether Google has stopped indexing the old material, or is simply not returning it for some reason. We don't even know if this behavior is intentional. It could be a side effect of some optimization that prematurely drops pages that it estimates will not rank high enough to show up in results. Fixing this bug (if it is one) may well have low priority (SWE hours are much more valuable than disk space).

I'm sure there is a fun discussion about all this on Google's internal and mailing lists.... but I no longer have access to them....

[link]

author · Dad
colophon · rights
picture of the day
January 15, 2018
· Technology (90 fragments)
· · Web (396 more)

By .

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.

I’m on Mastodon!