Download PDF version of this article PDF

Search—An Enterprising Affair

Edward Grossman, Editor, Queue

The searching-to-finding ratio is in need of improvement.

Arguably, search is the killer app on the Internet. If there was anyone left who’d argue it wasn’t at least in the top five, they’ve probably now recanted, as the verb to Google has entered the common vernacular (I Google, you Google, he Googles).

Somehow, browser developers have never really gotten the bulk of Internet users onto that ol’ address bar. You’ve surely seen a newbie getting on the Web, going to Yahoo or Google, and typing a Web address into the query box. (Don’t believe me? See http://www.metaspy.com for a glimpse at what people are searching for at MetaCrawler—too many dot-coms, I’d say.) Search is so popular that whole cottage industries of search engine marketing (SEM) and search engine optimization (SEO) have popped up—and the fastest-growing source of advertising revenue on the Web is from people simply paying to be at or near the top of search results.

Nothing new there, right? Right.

But what amazed me as the ACM Queue editorial board sat down to assemble this issue on search is that there’s something seriously broken here. Sure, Web search occasionally returns great results—but shouldn’t we expect as much from what in many cases is just a popularity contest (with often the most popular paying for the privilege)? What happens when we move to a place where people aren’t so desperate to be found, perhaps not wanting to be found at all: search in the enterprise?

Left to straightforward information-retrieval (IR) techniques, in a poorly linked heterogeneous content repository (your typical enterprise), search reveals its shortcomings. One generally doesn’t have the Google-like experience one would like on the intranet. Oh sure, drop a horde of engineers on the problem, and give each business unit a SEO/SEM consultant, and you’d fix the problem. But that “human solution” hints at the real issue: the core technology in search and IR still has a long way to go.

So in typical Queue fashion, we’ve decided to look at the underlying technical challenges that remain. First up, we strike at the core of the issue: Why don’t computers understand what we’re looking for? William Woods of Sun digs into the semantic and morphological underpinnings of search in his excellent piece, “Searching vs. Finding.” At the very least, we’ve got a great reason here to keep Moore’s law going—we’re going to need the horsepower.

Next up we go beyond straight IR and across the spectrum of the issues within the enterprise that keep search such a beast—heterogeneous content, poor intra-document linking, etc. Two leading enterprise search architects from Verity, Rajat Mukherjee and Jianchang Mao, outline the challenges in “Enterprise Search: Tough Stuff.”

So then of course the question comes: If you were to set out to write your own search engine, could you do it? Surely if Google can be a multibillion dollar company based on a nifty-but-not-Einsteinian popularity-ranking algorithm, it shouldn’t be quite so hard to hack out your own app to solve the search problem, right? Not so fast.

Anna Patterson of Stanford University, author of multiple search engines, takes a look at why the devil is in the details in “Why Writing a Search Engine Is Hard.” She has some interesting observations about building a search engine from scratch or cobbling one together from among the many commercial components out there.

And lest you think this is all theoretical, we’ve got a case study of sorts from Mike Cafarella and Doug Cutting of the open source search engine Nutch, who actually did write their own search engine.

Last but not least, we’ve sandwiched our special report on search with the optimist and the grumpster. On the hopeful side is Matt Wells, author of the Gigablast search engine, interviewed by founder of Infoseek Steve Kirsch. And on the grumpy side is Curmudgeon David Brown on why you can’t always get what you want (but if you try, sometimes, you get what you need). Enjoy!

EDWARD GROSSMAN is responsible for Queue, so blame him if you don’t like it. In earlier incarnations he was a development project manager at a still-in-business dot-com and a closet coder (his parents still don’t know—“Our son Ed? Oy, he works with computers, doing something”).

acmqueue

Originally published in Queue vol. 2, no. 2
Comment on this article in the ACM Digital Library








© ACM, Inc. All Rights Reserved.