What
 · Technology
 · · Search

Power Web Site · I pro­pose a new def­i­ni­tion. A site which is de­signed as the pri­ma­ry Web prop­er­ty for a per­son, place, or thing is a pow­er site if the per­son, place, or thing has a Wikipedia en­try but, in pop­u­lar search en­gi­nes, the site ranks above that Wikipedia en­try. There aren’t very many. But they fol­low sim­ple pat­tern­s ...
[14 comments]  
The End of the Golden Age? · For a few years now, the In­ter­net has been just in­sane­ly use­ful. Every­thing is there and you can find it when you need it. But Google is work­ing less and less well, and I’ve spot­ted an­oth­er po­ten­tial crack in its foun­da­tion­s. Will we look back on this as the time when it all start­ed to fall apart? ...
[15 comments]  
Mahalo Funnies · Some­one named “C.K. Sam­ple III” emailed me with an in­vite to try some­thing called Ma­halo, which I gath­er is a hot sub­ject among the A-listers. I was on a bor­ing tele­con, so I fired up Fire­fox and gave it a try. Hey, it said “Mahalo's search re­sults on­ly in­clude great links.” The re­sults are hi­lar­i­ous ...
[7 comments]  
NASDAQ: JAVA · Wow, they switched the tick­er. It will be lit­tle sur­prise to hear that the in­ter­nal con­ver­sa­tion has been sus­tained and loud. While there have been neg­a­tives along the lines of “OMG WTF PHB!?!?”, most of the in­ter­nal talk has echoed what they’re say­ing out in the bl­o­go­sphere. I’d like to add a cou­ple of points I haven’t seen else­where, one each on the pro and con side ...
[7 comments]  
Finding Things · That’s the ti­tle of my chap­ter in Beau­ti­ful Code, which seems now to be out, not that I’ve ac­tu­al­ly seen a copy. What’s amus­ing me to­day is that Find­ing Things is the chap­ter they’ve picked to post as a free PDF down­load. So, in the event that you’re in­ter­est­ed in the sub­ject but don’t care about what Kerni­han and Bent­ley and Pet­zold and Stein and Don­gar­ra and Cantrill and Mat­sumo­to and all the oth­ers have to say, you can avoid­ing buy­ing the book and do­ing Amnesty In­ter­na­tion­al a fa­vor. I have to say that the Table of Con­tents looks pret­ty im­pres­sive.
[2 comments]  
search.technorati.com · Now, this is what I’ve al­ways want­ed. I’m feel­ing kind of un­hap­py with my­self; time af­ter time, Dave Sifry has showed me some new frip­pery they’re rolling out at Tech­no­rati and I’ve said “Yeah, that’s kind of cool, maybe you could twid­dle X” even though it didn’t turn my per­son­al crank that much. My prob­lem has been that I was as­sum­ing that the way I want to use Tech­no­rati is un­usu­al. I use it for van­i­ty feeds of course, but the when I go to the site, I on­ly ev­er want to ask two ques­tion­s: “What are they say­ing about <insert re­cent event>?” and “Where was that ar­ti­cle I saw re­cent­ly about <insert subject>?” The new, very Google-flavored search.tech­no­rati.­com does those things, and that’s all it does. Plus, it seems a whole lot faster ...
[4 comments]  
Safe Tea · Now, that’s weird. As I’ve re­port­ed be­fore, one of the top sources of on­go­ing traf­fic is Google im­age search­es for “tea”; the pret­ty pic­ture from Tea is the #1 match. On­ly, some­times not. It turns out that if you turn SafeSearch of­f, as in “Yeah Google, it’s OK to show me the most ap­palling filth if you find it”, well, my lit­tle red teapot van­ish­es from the re­sults page. Turn SafeSearch back to “Moderate” and there it is. Is my com­po­si­tion re­gard­ed as so chaste that it must be kept away from pages where Sex Might Hap­pen? Grant­ed that there is no naked flesh, but the pot and the cup are both kind of curvy. Life is full of mys­ter­ies.
[6 comments]  
Tag Scheme? · In Atom, cat­e­gories have schemes. What scheme should we use for tags? ...
[26 comments]  
What I Search · Most peo­ple use com­put­ers most­ly for in­for­ma­tion stor­age. Which means that most peo­ple do a lot of search­ing. My most com­mon search is the Web as a whole, via ei­ther Google or Ya­hoo!, I try to switch back & forth from time to time. Next most com­mon would be email via GMail, my own slow mail­grep, and maybe some year Spot­light. After that would be the Web Event Stream via Tech­no­rati (for me, text all the time, tags al­most nev­er). Bring­ing up the rear would be a cer­tain amount of filesys­tem search­ing (Spot­light sort of work­s, ex­cept for my email) and grep­ping source code. I won­der if I’m typ­i­cal?
[9 comments]  
Stimulating Pictures · I don’t know why this tick­les me so much, but it does: on­go­ing is get­ting a cou­ple of thou­sand vis­its a week from peo­ple search­ing for “tea” or “cup of coffee” on Google Images. The piece en­ti­tled, well, Tea, is num­ber one! And for “cup of coffee”, A Damn Fine Cup of Cof­fee is #5. In­ter­est­ing­ly, nei­ther of them show up any­where near the top in the image-search func­tions of ei­ther Ya­hoo! or Microsoft’s Live Search. In fac­t, typ­ing those words in and see­ing what the three en­gines pro­duce is kind of in­ter­est­ing (Live Search no­tably in­cludes ac­tress Téa Leone).
[2 comments]  
Intelligent Search (Again) · I see where Pow­er­set is go­ing to pro­vide the next gen­er­a­tion of In­ter­net Search, and has raised some mon­ey from some apparently-smart peo­ple. Me, I’ve seen this movie be­fore. Good luck to ’em; they’ll need it.
[2 comments]  
Grass, Tea, Church, Search · I gath­er there are peo­ple out there—lots of people—whose liveli­hood more or less de­pends on their Google search rank. Here­with some thoughts on why this is scary ...
 
Microformats Search · The Mi­cro­for­mats kids all have re­al­ly great hair, and the coolest acronyms; stil­l, up till now, it’s all on­ly oc­ca­sion­al­ly seemed plau­si­ble to me. But this new Tech­no­rati mi­cro­for­mat search thing, I look at it and for the first time re­al­ly think “This could be big”. For ex­am­ple, look at kitchen.tech­no­rati.­com/even­t/search/­van­cou­ver (even the URI is in­ter­est­ing). It looks like some “sort-by” but­tons and au­thor­i­ty and key­word fil­ters would im­prove things; and if this catch­es on some Mon­day the spam­mers will be there by Wed­nes­day. But stil­l, we could be see­ing it hap­pen: small pieces com­bin­ing to pro­duce some­thing re­al­ly, re­al­ly big. [Dis­clo­sure: I have a con­flict of in­ter­est with re­spect to Tech­no­rati.]
 
Bloglines · So, Blog­lines has launched their blog search thing and, of all the blog search en­gines I have tried, this is one of them. Paul Quer­na says that it’s bet­ter be­cause it doesn’t do tags. Uh, OK. Any­how, con­grat­s; now that this tri­umph has been record­ed, maybe that will free up some Blog­lines cy­cles for fix­ing the ac­tu­al core of­fer­ing that makes them in­ter­est­ing, that mil­lions of peo­ple use to cruise the bl­o­go­sphere, that I used to rec­om­mend to ev­ery­one, and that Sam Ru­by just broke again? I would re­al­ly like to be a friend of Blog­li­nes.
 
The Future Search Market · Re­cent­ly, I learned that search providers pay for traf­fic, which makes all sorts of sense in a world where they’re of­fer­ing ap­prox­i­mate­ly equal lev­els of ser­vice. So, where to from here? I can see the op­por­tu­ni­ty to build a near-perfect mar­ket. (Please note for the record that in this piece, I agree with Ni­cholas Carr) ...
 
Search For Sale · In re­sponse to yesterday’s Buy­ing Search Traf­fic, Rus­sell Beat­tie (who works for Ya­hoo) writes: Search is al­ready de­ter­mined by who pays the most for it! Every­where you see a search box with a Google lo­go, be sure that there's a com­peti­tor out there that will pay for the same spot—because search ad­ver­tis­ing is so mon­e­ti­z­able. Google is ev­ery­where be­cause they're pay­ing for it. Wow, I had no idea. Now, this is just one person’s voice, but I’m run­ning it be­cause I think Rus­sell is prob­a­bly in a po­si­tion to know, and is hon­est in my ex­pe­ri­ence. Any­one else want to con­firm or re­fute? [Ah, Om Ma­lik was on the sto­ry back in Septem­ber.]
 
Buying Search Traffic · On im­pulse, I just twid­dled the on­go­ing soft­ware on my stag­ing serv­er so that when you do a search in the lit­tle box up at the top, it goes to Ya­hoo not Google. I ran a bunch of search­es, and in terms of re­sult qual­i­ty, there was noth­ing to choose from be­tween them. Ya­hoo seemed a lit­tle fresh­er; on this Sun­day it had Friday’s en­tries pret­ty well in­dexed, while Google was on­ly half there; they’re both OK for Thurs­day. So, at this mo­ment in time, my search box, and a zil­lion oth­ers like it, are point­ing to Google just be­cause that’s the way we set it up, and it’s ac­tu­al re­al work to go chang­ing pro­duc­tion sys­tem­s, and the com­pe­ti­tion so far isn’t sig­nif­i­cant­ly bet­ter. I have no idea what the pro­por­tion of search com­ing through this kind of thing is, as com­pared to the vol­ume go­ing through the search-engine home pages. I bet that if you count the tool­bars on the browser­s, it’s get­ting up there. Via Google’s AdSense For Search, you can al­ready get paid for send­ing search­es to Google. I won’t use it, though, be­cause if I read the terms and con­di­tions cor­rect­ly, you have to in­clude a Google lo­go. Screw that; I like my min­i­mal­ist lit­tle search box, and no­body but me and my em­ploy­er get any brand­ing here. I’m sure Ya­hoo has a com­pet­i­tive of­fer­ing, but I haven’t tracked it down. I’ll tell you one thing for sure though; if the search en­gines re­tain their quality-of-service par­i­ty, pret­ty soon the traf­fic will be dealt out to­tal­ly based on who’s will­ing to pay the most for it. Where can I buy shares in Fire­fox?
 
Web Tracking Snapshot · There are many ser­vices that claim to be “blog search”, but that’s the wrong way to think about it. There are a (very) few oc­ca­sions when I want to go and search for “what’s new on X”, and there are lots of ways to do that (the new Sphere is look­ing good in that space). But what I want to do 24/7, as long as the com­put­er is turned on, is what I call Web Track­ing: be­ing told right away when there’s some­thing new on the Web that I care about. I sub­scribe to a lot of Web Track­ing ser­vices; here­with a snap­shot of my im­pres­sion­s ...
 
Scoble ♥ RDF · Check out Scoble’s spec­u­la­tion on The Per­fect Search: he’d like to find a ho­tel in New York with free WiFi, a good view, and good food, in a par­tic­u­lar price-range. Rob, meet Tim Berners-Lee; Tim, meet Rob. Rob wants the Se­man­tic We­b. In par­tic­u­lar, today’s fresh­est SemWeb fla­vor is some­thing called SPARQL; see Ken­dall Clark’s human-readable in­tro. SPARQL is an an­swer to the ques­tion “What if I want to do SQL-like query­ing when I know per­fect­ly well that ev­ery­body will be us­ing their own in­com­pat­i­ble database schema?” I’ve been a SemWeb skep­tic, but I look at SPARQL and I think: Sup­pose you could as­sem­ble a ton of property-value pairs about web sites, and sup­pose on the front end you could build a nice re­spon­sive query page that al­lowed you to com­pose queries like Scoble’s ho­tel search; well then, SPARQL would be more or less ex­act­ly what you need to bridge the gap. Hey, isn’t Guha’s Alpiri project more or less that back-end? And isn’t Guha work­ing at Google now? Hm­m­m­m­m­m...
 
Buggy Google Blog Feeds · So Google has blog search. Sum­ma­ry: It’s fast, it’s rea­son­ably com­plete, it’s stripped-down in the typ­i­cal Google style, the re­sult rank­ing needs work, the time win­dow is way too deep. They’re al­so pro­vid­ing feed­s, which is good, but the feeds are hor­ri­bly bug­gy [Quick re­spon­se; One big bug’s al­ready fixed!] ...
 
How Big? Who Cares? · I’m re­al­ly hap­py that Dan­ny Sul­li­van com­pre­hen­sive­ly blew off this lat­est round of com­pet­i­tive chest-beating about search en­gine in­dex size. As Dan­ny says, “It’s ab­sur­d. It’s an­noy­ing. It’s a friggin’ waste of time.” And it’s piti­ful be­cause right now, Ya­hoo and MSN are show­ing signs of giv­ing Google the first se­ri­ous run for its mon­ey in re­cent mem­o­ry. So those guys should stay away from these ju­ve­nile dis­trac­tion­s.
 
Puzzling Search Study · I glanced at Tris­tan Louis’ Search Engine Com­par­i­son, think­ing it was in­ter­est­ing but not very use­ful. I was sur­prised to see a few oth­er blog­gers dis­cussing it as though it meant some­thing. The num­ber of pages that the var­i­ous en­gines claim to have in­dexed, and the num­ber they claim to re­turn for any search, re­al­ly don’t mean much. First of al­l, nobody’s got the time to look at more than a few dozen results—studies show that most peo­ple will nev­er look past the first page. Se­cond­ly, even if you want­ed to look at all the re­sult­s, the en­gines prob­a­bly couldn’t show them to you any­how. Third, what mat­ters is whether you get what you’re look­ing for. Al­most all the mod­ern en­gines do a pret­ty damn good job of get­ting you some­thing ap­pro­pri­ate and use­ful in the first hand­ful of re­sult­s. Who cares about the next mil­lion?
 
Technorati, Tags, Semantics · Hey, the Tech­no­rati be­ta is up. Looks much nicer, though I wish they’d lose the dude with the mega­phone; goa­tees are so 1993. (Hey look, Tech­no­rati and Newsweek, sit­ting in a tree.) Among oth­er things, the tech­no­ra­tionals are mak­ing a con­cert­ed ef­fort to prove that my doubts about tag­ging are misplaced—so are Shirky et al at You’re It!. It’s be­come ob­vi­ous that tags are use­ful enough as a place to park search words for pic­tures & mu­sic & oth­er stuff that doesn’t have words to search. Fur­ther­more, I’ve heard a dozen com­pelling sto­ries from peo­ple who are us­ing tags to or­ga­nize their own in­for­ma­tion and track trend­s; so it’s look­ing like the an­swers are: Yes, tag­ging is use­ful; No, it’s not a re­place­ment for full-text search, even par­tial­ly. On the sub­ject of search, Sun’s Search Guy Steve Green is try­ing to push over the bound­ary be­tween search and se­man­tic­s.
 
Search Engine Rankings · Re­cent­ly, some­one from a Google com­peti­tor told me that they were catch­ing up, with­in a few per­cent­age points. I didn’t be­lieve that at al­l, but I de­cid­ed that in­tu­ition is bor­ing and hard da­ta is in­ter­est­ing. So I went and ran search en­gine rank­ings for on­go­ing week­ly through 2005. The num­bers are sur­pris­ing, to say the least. [Up­date: Thought-provoking feed­back, and some con­clu­sion­s] [And more feed­back from Search Engine Watch.]. ...
 
Yahoo! Search FUSE · John Bat­telle re­ports on a con­ver­sa­tion with Y!’s Jeff Wein­er; it sounds like John heard more or less the same things I did. At the time I didn’t say too much about Jeff’s re­mark­s, but I think that John’s piece, while good, by­passed a re­al in­ter­est­ing part. Y!’s ral­ly­ing cry is FUSE: Find, Use, Share, and Ex­pand. So do you think they can beat Google at find­ing or us­ing? Well may­be, but I wouldn’t want to bet a busi­ness on it. But how about Share and Ex­pand? Y! has re­la­tion­ships with a lot of peo­ple out there: email re­la­tion­ship­s, fi­nance re­la­tion­ship­s, chat­ter re­la­tion­ship­s, you name it. Sup­pose they can make it re­al easy and at­trac­tive for the peo­ple in all those re­la­tion­ships to put some back; to Share stuff and Ex­pand the We­b. Be­ing first in line to help ev­ery­one Find and Use that good new stuff? Sounds like a plau­si­ble line of at­tack to me.
 
Talking to Yahoo · I had a good talk yes­ter­day with Jeff Wein­er, Se­nior VP of Search and Mar­ket­place over at Ya­hoo! I shouldn’t pass on what Jeff said; any­how if he wants to talk to the world, he has a blog. But I can talk about what I said: first, Y! should be watch­ing the Atom pro­to­col work like a hawk, be­cause they have two choic­es: ei­ther they try to beat ev­ery­one else out there and build the world’s great­est au­thor­ing tool, or they get be­hind a stan­dard­ized pro­to­col and let the cell­phone guys and PDA peo­ple and let ev­ery­one com­pete to do it. Se­cond, we were talk­ing about im­prov­ing search in gen­er­al; near as I can tel­l, there isn’t a huge qual­i­ty gap be­tween Y!, Google, and MSN, and it’s hard to be­lieve that any of them can sus­tain­ably get much ahead of the rest. On the oth­er hand, I think Y! has a good chance to take on Google in the ad­ver­tis­ing space, both AdSense and AdWord­s, and maybe win. They know a whole lot of stuff about a whole lot of peo­ple; for ex­am­ple, they know my stock-market port­fo­lio and what weath­er fore­casts and maps I look up; they prob­a­bly have more in­for­ma­tion about more in­di­vid­u­als than any­one else in the busi­ness. On be­half of all those ad­ver­tis­ing sell­ers and buy­er­s: it would sure be nice to see some com­pe­ti­tion. Maybe even some trans­paren­cy.
 
Still Wondering About Tags · This whole related-tags thing has been around for a mon­th, but Dave Sifry says it’s of­fi­cial. I went and tried a half-dozen and the re­sults were all over the map. I think I spot a pat­tern where things that are more or less steady-state are lame (Van­cou­ver, pros­ti­tu­tion), while it works well on cur­rent events: (Fire­fox, DeLay, Gomery). Which is in­tu­itive­ly plau­si­ble. But my ques­tion from last month still stand­s: Are tags use­ful? Are there any ques­tions you want to ask, or jobs you want to do, where tags are part of the so­lu­tion, and clear­ly work bet­ter than old-fashioned search? I re­al­ly want to be­lieve that tag­ging is big, a game-changer, but the longer I go on ask­ing this ques­tion and not get­ting an an­swer, the more ner­vous I get.
 
A Cherry-Tomato Winner · The chal­lenge has been met, and the crimson-vegetable award goes to... Google! It can now find im­ages cor­rect­ly based on meta­data; for ex­am­ple Saskatchewan snow with plants, Tanya King, sweet pea shad­ow, Ho­goro­mo din­ner, and so on. Nei­ther MSN nor Ya­hoo search can do this.
 
Crosstalk · Dear read­er­s, hon­esty (and the sto­ry I’m about to tel­l) re­quire that I spill the bean­s: there is lots of stuff here on on­go­ing that you can’t see. Since I have a pret­ty good writ­ing en­vi­ron­men­t, I com­pose lots of lit­tle pieces of one kind or an­oth­er and then “semi-publish” them; they’re out there on the Web at an ad­dress that looks a lot like that of the frag­ment you’re now read­ing, but there are no point­ers to them; se­cu­ri­ty by ob­scu­ri­ty, but good enough for my pur­pos­es. Last night I semi-published some­thing and emailed a few peo­ple ask­ing them to look at it. A cou­ple hours lat­er, I won­dered if they had, and checked the serv­er logs. I saw my­self (twice, I’d cor­rect­ed it), and... the Google­bot. What the hel­l? Did I ac­ci­den­tal­ly press “publish”? No. Baf­fle­men­t. Is my brows­er telling Google where I’m go­ing?!? Un­like­ly... ah! Even semi-published pieces have the ad­s, which are a Javascript call­back to some­where in the Google­plex. So, that’s what’s hap­pen­ing... when AdSense dis­plays on a page, at least some­times, it tips off the robot army. So any­one who’s run­ning AdSense gets in­dexed first & fastest. I can’t prove it, but it’s the sim­plest ex­pla­na­tion, and it makes all sorts of sense. [Up­date: If you look re­al close­ly at that robot (for geek­s, at the “User-Agent” field­), it’s not quite the same as the nor­mal google­bot; ap­par­ent­ly this beast is just read­ing the text of the page to fig­ure out what con­tex­tu­al­ly ap­pro­pri­ate ads to dis­play. Thanks to all the peo­ple who wrote to point this out­.]
 
Do Tags Work? · I was sit­ting up and got pinged by Dave Sifry about Technorati’s new related-tags fea­ture; Tech­no­rati thinks that Base­ball is re­lat­ed to Sports, MLB, Foot­ball, Bas­ket­ball, Nat­u­ral Phi­los­o­phy (got­ta love that), and tick­ets. Some don’t work that well, but the idea is com­pelling. I’ve been think­ing about this stuff a lot, and I have a ques­tion: Do tags work? It shouldn’t be too hard to find out ...
 
Picture Search and Gravel Hauling · If you’re in Flori­da near Inglis-Yankeetown and you want to haul dirt, rent a truck from Tim Bray. No re­la­tion, but I got­ta say that’s a nice-looking truck. I found it us­ing the new Ask.­com pic­ture search, which has a much nicer pre­sen­ta­tion than Google. How­ev­er, it (like all oth­er search en­gi­nes) sad­ly fails the Cherry-Tomato Chal­lenge.
 
Real Information Retrieval · Sum­ma­ry: find a Real Li­brar­i­an. The nar­ra­tive in­cludes de­mo­graph­ic trends and Bo Did­dley ...
 
Who’s Searching · I see that Forrester’s ex­cel­lent Char­lene Li is ex­pect­ing MSN search to gain on Google. Her ar­gu­ment sounds plau­si­ble, so I went and checked my log­files. Since Sun­day, I’ve had 1,222 peo­ple ar­rive at on­go­ing via Google, 166 via search.ya­hoo.­com, and 49 via MSN. If it gets a lit­tle closer, I’ll start hav­ing to run a reg­u­lar Search Mar­ket Share graph along with my Brows­er Mar­ket Share of­fer­ing.
 
Green On Search · Care about search in gen­er­al? Then you prob­a­bly should start read­ing Steve Green; he’s in Sun Labs and knows more about search tech­nol­o­gy than just about any­body, way more than me. Plus, he’s amus­ing.
 
The Cherry-Tomato Challenge · I have re­cent­ly ad­just­ed the on­go­ing soft­ware so that each and ev­ery im­age has de­scrip­tive text in both its alt and ti­tle at­tributes. This is good ac­ces­si­bil­i­ty prac­tice and should al­so make it pos­si­ble for search en­gines to find my pic­tures. But they can’t. The im­age here, which is in a file whose name, un­help­ful­ly, is IMGP0990.p­ng, is cor­rect­ly la­beled in both ti­tle and alt at­tributes as “Sunlit cher­ry toma­toes on white-painted wood.” I just now vis­it­ed John Battelle’s help­ful list of search en­gines and lots of them of­fered “image search” ca­pa­bil­i­ties, but not one turned that pic­ture up when I searched for “sunlit cher­ry tomatoes”. (Lots of them turn the page up when you do an or­di­nary text search.) How hard can it be? I here­by promise that when I find a cred­i­ble general-purpose Web Image Search tool that leads me to that pic­ture via “sunlit cher­ry tomatoes”, I will pub­lish a rave re­view here and do my best to spread the word.
 
Bot Droppings · I was idly watch­ing my serv­er log­files to­day, pret­ty qui­et on Sun­day af­ter­noon so it was most­ly just the crawler­s, and ob­served some puz­zling be­hav­ior from the Google­bot. So I ran a few re­port­s ...
 
On Search: Sorting Result Lists · I was talk­ing to some­one build­ing a search en­gine and he was moan­ing about sort­ing re­sult lists in re­al time, on­ly you don’t have to. Any­one who’s built a big search en­gine even­tu­al­ly works this out, but post­ing it here might save a few min­utes for some fu­ture de­vel­op­er. The idea is, you should nev­er have to do an O(N·log(N)) sort on a re­sult list. [Up­date: Ex­per­i­men­tal ver­i­fi­ca­tion.] ...
 
Pix MisGoogled · Google in­dex­es pic­tures all wrong. Here at on­go­ing, I used to store all my own pic­tures with nice names I in­vent­ed on the spur of the mo­men­t. Some­time last year, I re­al­ized my cam­eras were thought­ful­ly giv­ing each shot a nice guaranteed-unique name, so I just start­ed us­ing that; for ex­am­ple, that slug is in a file named IMG_2663.jpg. But, I’m care­ful to al­ways sup­ply an ap­pro­pri­ate alt tex­t, like so: <img alt='Vancouver slug' src='IMG_2663.png' />. It turns out that Google pret­ty much ig­nores the alt tex­t, which is ir­ri­tat­ing, so you’ll find my ros­es and prairi­escapes and Foo Cam­pers in Google on­ly with a lot of ef­fort. What’s re­al­ly weird is that Google does put a lot of weight on the ac­tu­al file-name. The rea­son I no­ticed this is that in any giv­en week, the most pop­u­lar im­age on on­go­ing is the pic­ture of Di­ablo found here, ap­par­ent­ly be­cause it’s in a file called di­a­blo.jpg. But you know, there are lots of pic­tures of Di­ablo out there, and not that many of the chapels at Brus­sels Air­port. So, Google could do a lot bet­ter here. [Note: I’m talk­ing about Google im­age search here, not reg­u­lar search. It’s still bro­ken.]
 
Report From the Intel Community · This has noth­ing to do with a Cal­i­for­nia chip mak­er. Rather, it’s about a trip I re­cent­ly took to a con­fer­ence called In­telink, where the peo­ple gath­er who run one of the world’s biggest and most in­ter­est­ing in­tranet­s; the one that serves the com­mu­ni­ty of U.S. In­tel­li­gence pro­fes­sion­al­s ...
 
Another “Intelligent Search” Skyrocket · In the On Search se­ries, I wrote a piece called In­tel­li­gence that ex­plained why in­tel­li­gent search is hard, but that it is so ea­ger­ly de­sired that there are pre­dictable flur­ries of ex­cite­ment ev­ery so of­ten over the nex­t, uh, pre­tender. This time, Cringe­ly has been sucked in. Wel­l, not en­tire­ly, he loads up with caveats too, but it’s a lit­tle sad to see one of the re­al­ly big-name writ­ers point to such tat­tered hy­pe. Earth to Bob: the prob­lem with AI isn’t that the “A” part isn’t fast enough, it’s that we don’t un­der­stand the “I” part. I won­der what it takes for some ob­scure lit­tle com­pa­ny ped­dling a dream that has been around the track so many times to get air­time with this guy? Cringe­ly needs to pull up his game a bit: in the last cou­ple of week­s, he was the on­ly per­son on the plan­et to con­clude that the Sun-Microsoft deal was some­how bad for the Ja­va Desk­top Sys­tem; not that he ac­tu­al­ly ad­vanced any ar­gu­ments on the sub­jec­t, just pro­claimed it. The peo­ple in Red­mond are smarter than Bob and I’m pret­ty sure that the deal isn’t mak­ing them wor­ry less about this.
 
What People Care About · Here­with the top cou­ple of dozen search strings that brought peo­ple to on­go­ing, sam­pled over the last few days. Let this be a les­son to you on what you can write about with­out de­vel­op­ing a “certain reputation” ...
 
Googlestorm · Some­times you glance at your serv­er log­file and say “Huh?” Click for the pic­ture; im­pres­sive on one hand, ir­ri­tat­ing on the oth­er. [Up­date: Eureka! Fig­ured it out; fixed the prob­lem.] ...
 
Cleanup Plus Search · Another batch of on­go­ing house­keep­ing. I added a search field up and to your right, which just out­sources the prob­lem to Google. Even­tu­al­ly there’ll be some­thing with an on­go­ing look com­ing out of this. Al­so I fixed a long-standing bug in the date dis­play, which con­vinced me the whole date-hierarchy sub­sys­tem was ba­si­cal­ly bro­ken so I re-did it, check it out. Quite like­ly I broke some­thing, if so let me know, my email ad­dress is on the front page of a Google search for my name. Al­so, IE6 was re­fus­ing to ren­der &apos; prop­er­ly for rea­sons I couldn’t fig­ure out, so I skat­ed around that.
 
Self-Limiting · I was talk­ing to­day to this re­al­ly smart guy named Jonathan Le­blang who works for A9, and he said “You know, Google’s suc­cess may con­ceal a death warrant.” I said “Huh?” He said “Well, the most use­ful Web pages used to be the ones that ag­gre­gat­ed a bunch of use­ful links, and so peo­ple would point to those and Google would find them. Nowa­days, why would any­one go to the work to put a page like that to­geth­er if you can just re­ly on Google to find stuff?” Hadn’t thought about it that way.
 
Search Variables · Scoble thinks he’d like to have ac­cess to all the vari­ables be­hind search en­gine re­sult rank­ings, and Bat­telle agrees. Hm­m... these are smart guys, but I think they’re both wrong on this one. Ex­pe­ri­ence shows that most users won’t even open up an “advanced search” fa­cil­i­ty, they just want to type their 1.3 words in­to the search win­dow and let the search tech do its stuff. And I’m one of them. I bet that most times, I’m go­ing to get good re­sults with less fuss & both­er by care­ful­ly se­lect­ing the search terms I type in than by fid­dling with knobs on the side of the en­gine. Be­cause hu­man lan­guage of­fers a much more sub­tler and more so­phis­ti­cat­ed set of con­trols and vari­ables than any soft­ware I’ve seen.
 
On Search, the Series · This se­ries of es­says on the con­struc­tion, de­ploy­ment and use of search tech­nol­o­gy (by which I mean pri­mar­i­ly “full-text” search) was writ­ten be­tween June and De­cem­ber of 2003. It has fif­teen in­stal­ments not in­clud­ing this ta­ble of con­tents ...
 
Turn On Search · This is the last in my se­ries of On Search es­says. I’ve writ­ten these pieces be­cause I care about search and be­cause the lessons of ex­pe­ri­ence are worth writ­ing down; but al­so be­cause I’d like to change this part of the world. In short, I’d like to ar­range for ba­si­cal­ly ev­ery se­ri­ous com­put­er in the world to come with fast, ef­fi­cien­t, easy-to-manage search soft­ware that Just Work­s. This es­say is about what that soft­ware should look like. Ear­ly next year I’ll write some­thing on how it might get built ...
 
John Battelle on Search · It turns out that John Bat­telle, who’s made his mark at a bunch of dif­fer­ent pretty-good in­dus­try pub­li­ca­tion­s, whom I met at the Foo Cam­p, and who in­ter­viewed me about it for Busi­ness 2.0, is one of the few peo­ple in the world who ob­sess about search tech­nol­o­gy as much as I do. I just stum­bled across his ex­cel­lent Search­blog; any­one who’s tak­en the trou­ble to plow through my es­says on the sub­ject will prob­a­bly find a sub­scrip­tion worth­while.
 
On Search: Robots · Robots—also known as spiders—are pro­grams that re­trieve Web re­sources, us­ing the em­bed­ded hy­per­links to au­to­mate the pro­cess. Robots re­trieve da­ta for all sorts of pur­pos­es, but they were in­vent­ed most­ly to drive search en­gi­nes. Here­with a tour through Robot Vil­lage ...
 
On Search: XML · Search­ing is all about tex­t, and the pro­por­tion of all the world’s text that is XML keeps get­ting high­er and high­er. So if you’re go­ing to do search, at some point you’re go­ing to have to think about search­ing XML. Here­with a sur­vey of some of the is­sues and prob­lems (which, like oth­er es­says as we ap­proach the end of On Search, con­tains opin­ions among the re­portage) ...
 
On Search: Interfaces · Here­with an in­ves­ti­ga­tion of how search soft­ware ought to in­ter­act with the out­side world. I’ll start with a look at the cur­rent state of the art, and pro­pose an­oth­er (I think bet­ter) ap­proach. This is, I think, the third-last On Search piece, so a few words at the meta lev­el about that ...
 
On Search: Result Ranking · If you’re search­ing a big database, un­less you’re lucky you’re usu­al­ly go­ing to get a lot more match­es to any giv­en query than you want to look through. So it re­al­ly mat­ters what or­der that re­sult list is in. Google got to be fa­mous in large part be­cause they do a good job on this; the stuff near the top of their list is usu­al­ly about what you wan­t, and if you don’t see what you need near the top of the list, it may not be out there. Here­with some re­marks on how to go about sort­ing re­sult list­s. In gen­er­al, the news is not very good; how­ev­er, there are some promis­ing tech­niques that are under-explored ...
 
I Ferment Not · The oth­er day I looked at on­go­ing’s top ref­er­ers and not for the first time, saw that a Google search for “fermentation” was right up there. What hap­pened was, I wrote a piece called Lan­guage Fer­men­ta­tion back in May that was se­vere programming-language-theory geek­ery, and, well, a lot of programmming-language geeks linked to it, and now any luck­less stu­dent of zy­mur­gy or bud­ding opimi­an is go­ing to find on­go­ing at #2 in their result-list. While this is ev­i­dence that Google’s PageRank some­times goes off the rail­s, the first en­try in that re­sult list is (quite prop­er­ly) the Jour­nal of Fer­men­ta­tion and Bio­engi­neer­ing.
 
On Search: I18n · The “On Search” se­ries re­sumes with this look at the is­sues that arise in search when (as you must) you deal with words from around the worlds writ­ten in the char­ac­ters that the peo­ple around the world use. I18n stands for “internationalization.” ...
 
What is Nasty? · Ques­tions Google an­swered by send­ing peo­ple to on­go­ing: What is a colophon? What is a names­pace? What is a web tag? What is a wife-beater? What is bi­na­ry search? What is char­ac­ter string? What is "DC" day? What is fer­men­ta­tion? What is ja­va lan­guages es­say? What is mea­sured num­ber­s? What is nasty? What is nat­u­ral lan­guage query? What is on­go­ing? What is peaw? What is pre­ci­sion and re­cal­l? What is Price Server? What is prod­uct man­age­men­t? What is refac­tor­ing soft­ware? What is rock­et sci­ence? What is se­man­tic­s? What is share­crop­ping? What is stax-volt? What is tex­t, re­al­ly? What is the antonym of kiosk? What is the best way for writ­ing bi­na­ry tree search? What is the dewey dec­i­mal call num­ber for Is­lam? What is the ex­ten­sion of a rd­dl doc­u­men­t? What is the fastest string ma­nip­u­la­tion lan­guage? What is the use of a book thought al­ice with­out pic­tures or con­ver­sa­tion? What is through­put? What is trib­al­is­m? What is uni­code? What is wi-fi? What is XML API tech­nol­o­gy? What is xml rpc REST?
 
On Search: Metadata · In the Web’s ear­ly years, the over­whelm­ing fa­vorite among search en­gines was Ya­hoo. To­day it’s Google. Nei­ther has ac­tu­al­ly had bet­ter text search tech­nol­o­gy than the com­pe­ti­tion. They won be­cause they used meta­da­ta ef­fec­tive­ly to make their ser­vices more use­ful. In this ninth On Search episode, a sur­vey of what meta­da­ta is, where it comes from, and how to use it ...
 
On Search: Stopwords · Here’s a Google search for a fa­mous phrase, to be or not to be; give it a try and see what hap­pen­s. When you look at word fre­quen­cies, it ap­pears that there are a few words that ap­pear un­rea­son­ably of­ten and car­ry un­rea­son­ably lit­tle in­for­ma­tion. They are called “stopwords,” and this (brief) eighth bead in the On Search neck­lace con­sid­ers them ...
 
On Search: UI Archeology · This chap­ter of the On Search saga is a side-trip; a look at an un­usu­al search us­er in­ter­face I built a dozen years ago. One of the rea­sons it didn’t catch on back then was that there wasn’t enough XML in the world. Now that there is, maybe this bit of lega­cy code will pro­voke an idea or two. Just may­be, it con­tains some ideas that will be use­ful to the folk who are won­der­ing how to make the pow­er of XPath and XQuery use­ful to or­di­nary peo­ple ...
 
On Search: Squirmy Words · In this, the sixth in­stal­ment of my search saga, a sur­vey of the fuzzy edges of words and their mean­ings and the (sur­pris­ing­ly mod­er­ate) con­se­quences for search sys­tem­s ...
 
On Search: Intelligence · Here’s the prob­lem: search­ing for words isn’t re­al­ly what you want to do. You’d like to search for ideas, for con­cept­s, for so­lu­tion­s, for an­swer­s. In­stead, your typ­i­cal search en­gine mo­ron­i­cal­ly sorts through its post­ings, and tries to solve your prob­lems by look­ing at which words ap­pear where, and how of­ten, and so on. What we’d re­al­ly like is an in­tel­li­gent search en­gine. This es­say is most­ly about why we’re not like­ly to get one any time soon ...
 
On Search: Precision and Recall · Search­ing is a branch of com­put­er pro­gram­ming, which is sup­posed to be a quan­ti­ta­tive dis­ci­pline and a mem­ber of the en­gi­neer­ing fam­i­ly. That means we should have met­ric­s: mea­sures of how good our search tech­niques are. Other­wise, how can we ev­er mea­sure im­prove­ments in one sys­tem or the dif­fer­ences be­tween two sys­tem­s? “Precision” and “recall” are the most com­mon mea­sures of search per­for­mance. But they’re not as help­ful as we’d like ...
 
On Search: Basic Basics · In this on­go­ing sa­fari through the Search hin­ter­land, I had thought next to talk about pop­u­lar fea­tures of search en­gines and their costs and ben­e­fits and so on. But I think that ev­ery­thing else I want to cov­er will be eas­i­er if there’s a shared view of the ma­chin­ery mak­ing it all go. So here’s a tour through the ba­sics of search-engine en­gi­neer­ing ...
 
On Search: The Users · Here­with Chap­ter Two of the search trav­el­ogue. Between late 1994 and ear­ly 1996 I was oc­cu­pied full-time and then some build­ing and run­ning one of the first Web search en­gi­nes, the long-departed Open Text In­dex. There weren’t many million-hits-a-day sites back then. When you’re run­ning that kind of thing, you spend a lot of time watch­ing your logs to fig­ure out what your users are do­ing and what makes them hap­py. There are two lessons that loom larg­er than all the oth­ers put to­geth­er ...
 
On Search: Backgrounder · This is the first of a se­ries on search, by which I mean full-text search. Any­one who us­es com­put­ers now us­es search pret­ty well ev­ery day, so this is an im­por­tant chunk of our tech­nol­o­gy spec­trum. This piece cov­ers the busi­ness and his­to­ry an­gles; fu­ture in­stal­ments will ex­plain how search en­gines work and the in­ter­faces to them. I plan to con­clude with a de­scrip­tion of the next search en­gine, which doesn’t ex­ist yet but some­one ought to start build­ing.
(Up­dat­ed: Mi­crosoft In­dex­ing found.
Slash­dot search ex­plained
)
 ...
 
The Death of Scholarship? · Some maze of twisty lit­tle blog­pas­sages led me to this study of Stu­dent Search­ing Be­hav­ior. It's re­al­ly long and wordy, but the sound­bite is that when stu­dents are asked to look up some­thing rel­e­vant to their aca­dem­ic work, 45% of them go to Google, 10% of them go to the lo­cal li­brary cat­a­log, and the rest scat­ter among oth­er search en­gi­nes. I like Google as much as the next per­son, but I still find this re­al­ly dis­turbing, es­pe­cial­ly that 10% fig­ure ...
 
The Natural Language Query Fallacy · This is pro­voked by an ar­ti­cle at Mary Jo Foley's Mi­crosoft Watch, of­ten quite a use­ful place. She sug­gests that one of the won­ders of Longhorn, the mighty Windows-to-be, is that users “will be able to type in com­mands (such as, ‘Find all the spread­sheets I gen­er­at­ed last year that in­clud­ed sales da­ta from Bob Jones’), and Longhorn will auto-magically re­turn the results.” I'm won­der­ing why this is sup­posed to be a good idea ...
 
The Google vs. Blogs Controversy · I see that Wired has picked up on An­drew Orlowski's over­wrought at­tempt to cre­ate a news sto­ry about the ef­fect of the admittedly-incestuous blog net­work on Google re­sult­s. I'm not sure how rep­re­sen­ta­tive on­go­ing is, but a look at my da­ta doesn't re­al­ly sug­gest there's much here to be con­cerned about ...
 
author · Dad · software · colophon · rights
Random image, linked to its containing fragment

By .

I am an employee
of Amazon.com, but
the opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.