Articles
Information Extraction: Distilling Structured Data from Unstructured Text
In 2001 the U.S. Department of Labor was tasked with building a Web site that would help people find continuing education opportunities at community colleges, universities, and organizations across the country. The department wanted its Web site to support fielded Boolean searches over locations, dates, times, prerequisites, instructors, topic areas, and course descriptions. Ultimately it was also interested in mining its new database for patterns and educational trends. This was a major data-integration project, aiming to automatically gather detailed, structured information from tens of thousands of individual institutions every three months.
Social Bookmarking in the Enterprise
One of the greatest challenges facing people who use large information spaces is to remember and retrieve items that they have previously found and thought to be interesting. One approach to this problem is to allow individuals to save particular search strings to re-create the search in the future. Another approach has been to allow people to create personal collections of material—for example, the use of electronic citation bundles (called binders) in the ACM Digital Library. Collections of citations can be created manually by readers or through execution of (and alerting to) a saved search.
Threads without the Pain
Much of today’s software deals with multiple concurrent tasks. Web browsers support multiple concurrent HTTP connections, graphical user interfaces deal with multiple windows and input devices, and Web and DNS servers handle concurrent connections or transactions from large numbers of clients.
Fighting Spam with Reputation Systems
Spam is everywhere, clogging the inboxes of e-mail users worldwide. Not only is it an annoyance, it erodes the productivity gains afforded by the advent of information technology. Workers plowing through hours of legitimate e-mail every day also must contend with removing a significant amount of illegitimate e-mail. Automated spam filters have dramatically reduced the amount of spam seen by the end users who employ them, but the amount of training required rivals the amount of time needed simply to delete the spam without the assistance of a filter.
Interviews
A Conversation with Ray Ozzie
There are not many names bigger than Ray Ozzie's in computer programming. An industry visionary and pioneer in computer-supported cooperative work, he began his career as an electrical engineer but fairly quickly got into computer science and programming. He is the creator of IBM's Lotus Notes and is now chief technical officer of Microsoft, reporting to chief software architect Bill Gates. Recently, Ozzie's role as chief technical officer expanded as he assumed responsibility for the company's software-based services strategy across its three major divisions.
Curmudgeon
Stop Whining about Outsourcing!
I’m sick of hearing all the whining about how outsourcing is going to migrate all IT jobs to the country with the lowest wages.
The paranoia inspired by this domino theory of job migration causes American and West European programmers to worry about India, Indian programmers to worry about China, Chinese programmers to worry about the Czech Republic, and so on. Domino theorists must think all IT jobs will go to the Republic of Elbonia, the extremely poor, fourth-world, Eastern European country featured in the Dilbert comic strip.
Kode Vicious
Kode Vicious: The Doctor is In
Dear Kode Vicious, I've been reading your rants for a few months now and was hoping you could read one of mine. It's a pretty simple rant, actually: it's just that I'm tired of hearing about buffer overflows and dont understand why anyone in his or her right mind still uses strcpy(). Why does such an unsafe routine continue to exist at all? Why not just remove the thing from the library and force people to migrate their code? Another thing I wonder is, how did such an API come to exist in the first place?
