... an error has occurred and the site administrator has been notified.
It is possible that this was a temporary problem and is already corrected so please try to refresh this page.
We apologize for this inconvenience.
If the problem persists please contact us: email@example.com
Originally published in Queue vol. 3, no. 8—
see this item in the ACM Digital Library
Andrew McCallum - Information Extraction
In 2001 the U.S. Department of Labor was tasked with building a Web site that would help people find continuing education opportunities at community colleges, universities, and organizations across the country. The department wanted its Web site to support fielded Boolean searches over locations, dates, times, prerequisites, instructors, topic areas, and course descriptions. Ultimately it was also interested in mining its new database for patterns and educational trends. This was a major data-integration project, aiming to automatically gather detailed, structured information from tens of thousands of individual institutions every three months.
Alon Halevy - Why Your Data Won't Mix
When independent parties develop database schemas for the same domain, they will almost always be quite different from each other. These differences are referred to as semantic heterogeneity, which also appears in the presence of multiple XML documents, Web services, and ontologies—or more broadly, whenever there is more than one way to structure a body of data. The presence of semi-structured data exacerbates semantic heterogeneity, because semi-structured schemas are much more flexible to start with. For multiple data systems to cooperate with each other, they must understand each other’s schemas.
Natalya Noy - Order from Chaos
There is probably little argument that the past decade has brought the “big bang” in the amount of online information available for processing by humans and machines. Two of the trends that it spurred (among many others) are: first, there has been a move to more flexible and fluid (semi-structured) models than the traditional centralized relational databases that stored most of the electronic data before; second, today there is simply too much information available to be processed by humans, and we really need help from machines.
C. M. Sperberg-McQueen - XML
XML, as defined by the World Wide Web Consortium in 1998, is a method of marking up a document or character stream to identify structural or other units within the data. XML makes several contributions to solving the problem of semi-structured data, the term database theorists use to denote data that exhibits any of the following characteristics: