Search Engines

Vol. 2 No. 2 – April 2004

Search Engines

Why Writing Your Own Search Engine Is Hard:
Big or small, proprietary or open source, Web or intranet, it’s a tough job.

There must be 4,000 programmers typing away in their basements trying to build the next “world’s most scalable” search engine. It has been done only a few times. It has never been done by a big group; always one to four people did the core work, and the big team came on to build the elaborations and the production infrastructure. Why is it so hard? We are going to delve a bit into the various issues to consider when writing a search engine. This article is aimed at those individuals or small groups that are considering this endeavor for their Web site or intranet. It is fun, but a word of caution: not only is it difficult, but you need two commodities in short supply: time and patience.

by Anna Patterson

Gaming Graphics: The Road to Revolution:
From laggard to leader, game graphics are taking us in new directions.

It has been a long journey from the days of multicolored sprites on tiled block backgrounds to the immersive 3D environments of modern games. What used to be a job for a single game creator is now a multifaceted production involving staff from every creative discipline. The next generation of console and home computer hardware is going to bring a revolutionary leap in available computing power; a teraflop (trillion floating-point operations per second) or more will be on tap from commodity hardware. This leap in power will bring with it a leap in expectations, both on the part of the consumer and the creative professional.

by Nick Porcino

Instant Messaging or Instant Headache?:
IM has found a home within the enterprise, but it’s far from secure.

It’s a reality. You have IM (instant messaging) clients in your environment. You have already recognized that it is eating up more and more of your network bandwidth and with Microsoft building IM capability into its XP operating system and applications, you know this will only get worse. Management is also voicing concerns over the lost user productivity caused by personal conversations over this medium. You have tried blocking these conduits for conversation, but it is a constant battle. Tools are now available to make this blocking job easier, such as those from Akonix, FaceTime Communications, and NetIQ, but IM is maturing, and your users are starting to depend on it as an essential business tool.

by John Stone, Sarah Merrion

A Conversation with Matt Wells:
When it comes to competing in the search engine arena, IS bigger always better?

Search is a small but intensely competitive segment of the industry, dominated for the past few years by Google. But Google’s position as king of the hill is not insurmountable, says Gigablast’s Matt Wells, and he intends to take his product to the top.

Building Nutch: Open Source Search:
A case study in writing an open source search engine

Search engines are as critical to Internet use as any other part of the network infrastructure, but they differ from other components in two important ways. First, their internal workings are secret, unlike, say, the workings of the DNS (domain name system). Second, they hold political and cultural power, as users increasingly rely on them to navigate online content.

by Mike Cafarella, Doug Cutting

Enterprise Search: Tough Stuff:
Why is it that searching an intranet is so much harder than searching the Web?

The last decade has witnessed the growth of information retrieval from a boutique discipline in information and library science to an everyday experience for billions of people around the world. This revolution has been driven in large measure by the Internet, with vendors focused on search and navigation of Web resources and Web content management. Simultaneously, enterprises have invested in networking all of their information together to the point where it is increasingly possible for employees to have a single window into the enterprise. Although these employees seek Web-like experiences in the enterprise, the Internet and enterprise domains differ fundamentally in the nature of the content, user behavior, and economic motivations.

by Rajat Mukherjee, Jianchang Mao

Searching vs. Finding:
Why systems need knowledge to find what you really want

Finding information and organizing it so that it can be found are two key aspects of any company’s knowledge management strategy. Nearly everyone is familiar with the experience of searching with a Web search engine and using a search interface to search a particular Web site once you get there. (You may have even noticed that the latter often doesn’t work as well as the former.) After you have a list of hits, you typically spend a significant amount of time following links, waiting for pages to download, reading through a page to see if it has what you want, deciding that it doesn’t, backing up to try another link, deciding to try another way to phrase your request, et cetera. Eventually you may find what you want, or you may ultimately give up and decide that you can’t find it. Why is this so difficult?

by William A Woods

Web Search Considered Harmful:
The top five reasons why search is still way too hard

Nowadays, when you find yourself utterly disgusted by “American Idol,” or any other of the latest “reality” shows on TV, you may decide, “What the heck, time to seek a slightly less horrible form of punishment: let’s get on the Web.”

by David J Brown