Information Extraction:
Distilling structured data from unstructured text
In 2001 the U.S. Department of Labor was tasked with building a Web site that would help people find continuing education opportunities at community colleges, universities, and organizations across the country. The department wanted its Web site to support fielded Boolean searches over locations, dates, times, prerequisites, instructors, topic areas, and course descriptions. Ultimately it was also interested in mining its new database for patterns and educational trends. This was a major data-integration project, aiming to automatically gather detailed, structured information from tens of thousands of individual institutions every three months.
Kode Vicious:
The Doctor is In
KV is back on duty and ready to treat another koding illness: bad APIs. This is one of the most widespread pathologies affecting, and sometimes infecting, us all. But whether we write APIs or simply use APIs (or both), we would all do well to read on and heed the vicious one’s advice.
Threads without the Pain:
Multithreaded programming need not be so angst-ridden.
Much of today’s software deals with multiple concurrent tasks. Web browsers support multiple concurrent HTTP connections, graphical user interfaces deal with multiple windows and input devices, and Web and DNS servers handle concurrent connections or transactions from large numbers of clients. The number of concurrent tasks that needs to be handled increases while software grows more complex. Structuring concurrent software in a way that meets the increasing scalability requirements while remaining simple, structured, and safe enough to allow mortal programmers to construct ever-more complex systems is a major engineering challenge.
Social Bookmarking in the Enterprise:
Can your organization benefit from social bookmarking tools?
One of the greatest challenges facing people who use large information spaces is to remember and retrieve items that they have previously found and thought to be interesting. One approach to this problem is to allow individuals to save particular search strings to re-create the search in the future. Another approach has been to allow people to create personal collections of material. Collections of citations can be created manually by readers or through execution of (and alerting to) a saved search.
Fighting Spam with Reputation Systems:
User-submitted spam fingerprints
Spam is everywhere, clogging the inboxes of e-mail users worldwide. Not only is it an annoyance, it erodes the productivity gains afforded by the advent of information technology. Workers plowing through hours of legitimate e-mail every day also must contend with removing a significant amount of illegitimate e-mail. Automated spam filters have dramatically reduced the amount of spam seen by the end users who employ them, but the amount of training required rivals the amount of time needed simply to delete the spam without the assistance of a filter.
Stop Whining about Outsourcing!:
I’m sick of hearing all the whining about how outsourcing is going to migrate all IT jobs to the country with the lowest wages.
The paranoia inspired by this domino theory of job migration causes American and West European programmers to worry about India, Indian programmers to worry about China, Chinese programmers to worry about the Czech Republic, and so on. Domino theorists must think all IT jobs will go to the Republic of Elbonia, the extremely poor, fourth-world, Eastern European country featured in the Dilbert comic strip.
A Conversation with Ray Ozzie:
Cooperate, Communicate, Collaborate
There are not many names bigger than Ray Ozzie’s in computer programming. An industry visionary and pioneer in computer-supported cooperative work, he began his career as an electrical engineer but fairly quickly got into computer science and programming. He is the creator of IBM’s Lotus Notes and is now chief technical officer of Microsoft, reporting to chief software architect Bill Gates. Recently, Ozzie’s role as chief technical officer expanded as he assumed responsibility for the company’s software-based services strategy across its three major divisions.