Search Engines

Vol. 2 No. 2 – April 2004

Search Engines

Articles

Why Writing Your Own Search Engine Is Hard

There must be 4,000 programmers typing away in their basements trying to build the next "world's most scalable" search engine. It has been done only a few times. It has never been done by a big group; always one to four people did the core work, and the big team came on to build the elaborations and the production infrastructure. Why is it so hard? We are going to delve a bit into the various issues to consider when writing a search engine. This article is aimed at those individuals or small groups that are considering this endeavor for their Web site or intranet. It is fun, but a word of caution: not only is it difficult, but you need two commodities in short supply: time and patience.

Why Writing Your Own Search Engine is Hard

ANNA PATTERSON, STANFORD UNIVERSITY

Big or small, proprietary or open source, Web or intranet, it’s a tough job.

by Anna Patterson

Gaming Graphics: The Road to Revolution

It has been a long journey from the days of multicolored sprites on tiled block backgrounds to the immersive 3D environments of modern games. What used to be a job for a single game creator is now a multifaceted production involving staff from every creative discipline. The next generation of console and home computer hardware is going to bring a revolutionary leap in available computing power; a teraflop (trillion floating-point operations per second) or more will be on tap from commodity hardware. This leap in power will bring with it a leap in expectations, both on the part of the consumer and the creative professional.

Gaming Graphics: Road to Revolution

NICK PORCINO, LUCASARTS

From laggard to leader, game graphics are taking us in new directions.

It has been a long journey from the days of multicolored sprites on tiled block backgrounds to the immersive 3D environments of modern games. What used to be a job for a single game creator is now a multifaceted production involving staff from every creative discipline. The next generation of console and home computer hardware is going to bring a revolutionary leap in available computing power; a teraflop (trillion floating-point operations per second) or more will be on tap from commodity hardware. This leap in power will bring with it a leap in expectations, both on the part of the consumer and the creative professional.

by Nick Porcino

Instant Messaging or Instant Headache?

It's a reality. You have IM (instant messaging) clients in your environment. You have already recognized that it is eating up more and more of your network bandwidth and with Microsoft building IM capability into its XP operating system and applications, you know this will only get worse. Management is also voicing concerns over the lost user productivity caused by personal conversations over this medium. You have tried blocking these conduits for conversation, but it is a constant battle. Tools are now available to make this blocking job easier, such as those from Akonix, FaceTime Communications, and NetIQ, but IM is maturing, and your users are starting to depend on it as an essential business tool.

Instant Messaging or Instant Headache?

JOHN STONE AND SARAH MERRION, SYMANTEC

IM has found a home within the enterprise, but it’s far from secure.

by John Stone, Sarah Merrion

Interviews

A Conversation with Matt Wells

Search is a small but intensely competitive segment of the industry, dominated for the past few years by Google. But Google's position as king of the hill is not insurmountable, says Gigablast's Matt Wells, and he intends to take his product to the top.

A Conversation with Matt Wells

When it comes to competing in the search engine arena, IS bigger always better?

Search is a small but intensely competitive segment of the industry, dominated for the past few years by Google. But Google’s position as king of the hill is not insurmountable, says Gigablast’s Matt Wells, and he intends to take his product to the top.

Articles

Building Nutch: Open Source Search

Search engines are as critical to Internet use as any other part of the network infrastructure, but they differ from other components in two important ways. First, their internal workings are secret, unlike, say, the workings of the DNS (domain name system). Second, they hold political and cultural power, as users increasingly rely on them to navigate online content.

Building Nutch: Open Source Search

MIKE CAFARELLA AND DOUG CUTTING, NUTCH

A case study in writing an open source search engine

Search engines are as critical to Internet use as any other part of the network infrastructure, but they differ from other components in two important ways. First, their internal workings are secret, unlike, say, the workings of the DNS (domain name system). Second, they hold political and cultural power, as users increasingly rely on them to navigate online content.

When so many rely on services whose internals are closely guarded, the possibilities for honest mistakes, let alone abuse, are worrisome. Further, keeping search-engine algorithms secret means that further advances in the area become less likely. Much relevant research is kept behind corporate walls, and useful methods remain largely unknown.

by Mike Cafarella, Doug Cutting

Enterprise Search: Tough Stuff

The last decade has witnessed the growth of information retrieval from a boutique discipline in information and library science to an everyday experience for billions of people around the world. This revolution has been driven in large measure by the Internet, with vendors focused on search and navigation of Web resources and Web content management. Simultaneously, enterprises have invested in networking all of their information together to the point where it is increasingly possible for employees to have a single window into the enterprise. Although these employees seek Web-like experiences in the enterprise, the Internet and enterprise domains differ fundamentally in the nature of the content, user behavior, and economic motivations.

Enterprise Search: Tough Stuff

RAJAT MUKHERJEE AND JIANCHANG MAO, VERITY

Why is it that searching an intranet is so much harder than searching the Web?

The last decade has witnessed the growth of information retrieval from a boutique discipline in information and library science to an everyday experience for billions of people around the world. This revolution has been driven in large measure by the Internet, with vendors focused on search and navigation of Web resources and Web content management. Simultaneously, enterprises have invested in networking all of their information together—to the point where it is increasingly possible for employees to have a single window into the enterprise. Although these employees seek Web-like experiences in the enterprise, the Internet and enterprise domains differ fundamentally in the nature of the content, user behavior, and economic motivations.

Our principal focus here is on outlining the demands on information retrieval in enterprises and various technologies that are employed in an enterprise content infrastructure. We define an enterprise to mean any collaborative effort involving proprietary information, whether commercial, academic, governmental, or nonprofit. The term search is usually used to mean keyword search. In this article, we use a broader definition that encompasses advanced search capabilities, navigation, and information discovery.

by Rajat Mukherjee, Jianchang Mao

Searching vs. Finding

Finding information and organizing it so that it can be found are two key aspects of any company's knowledge management strategy. Nearly everyone is familiar with the experience of searching with a Web search engine and using a search interface to search a particular Web site once you get there. (You may have even noticed that the latter often doesn't work as well as the former.) After you have a list of hits, you typically spend a significant amount of time following links, waiting for pages to download, reading through a page to see if it has what you want, deciding that it doesn't, backing up to try another link, deciding to try another way to phrase your request, et cetera. Eventually you may find what you want, or you may ultimately give up and decide that you can't find it. Why is this so difficult?

Searching Vs. Finding

WILLIAM A. WOODS, SUN MICROSYSTEMS LABORATORIES

Why systems need knowledge to find what you really want

by William A Woods

Curmudgeon

Web Search Considered Harmful

Nowadays, when you find yourself utterly disgusted by "American Idol," or any other of the latest "reality" shows on TV, you may decide, "What the heck, time to seek a slightly less horrible form of punishment: let's get on the Web."

Web Search Considered Harmful

David J. Brown, Queue Advisory Board Member

The top five reasons why search is still way too hard

by David J Brown