acmqueue is now an app!

Download from iTunes or Google Play, or view within your browser.

ACM professional members can sign in with their ACM web account to read acmqueue for free.

Non-members can subscribe to acmqueue for $19.99 per year.

More information here

Semi-structured Data

  Download PDF version of this article

ACM DL Site Error

We are sorry ...

... an error has occurred and the site administrator has been notified.
It is possible that this was a temporary problem and is already corrected so please try to refresh this page.

We apologize for this inconvenience.

If the problem persists please contact us:

The ACM Digital Library is published by the Association for Computing Machinery. Copyright � 2010 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us


Originally published in Queue vol. 3, no. 8
see this item in the ACM Digital Library

For more articles and columns like this, check out the latest issue of acmqueue magazine



Andrew McCallum - Information Extraction
In 2001 the U.S. Department of Labor was tasked with building a Web site that would help people find continuing education opportunities at community colleges, universities, and organizations across the country. The department wanted its Web site to support fielded Boolean searches over locations, dates, times, prerequisites, instructors, topic areas, and course descriptions. Ultimately it was also interested in mining its new database for patterns and educational trends. This was a major data-integration project, aiming to automatically gather detailed, structured information from tens of thousands of individual institutions every three months.

Alon Halevy - Why Your Data Won't Mix
When independent parties develop database schemas for the same domain, they will almost always be quite different from each other. These differences are referred to as semantic heterogeneity, which also appears in the presence of multiple XML documents, Web services, and ontologies—or more broadly, whenever there is more than one way to structure a body of data. The presence of semi-structured data exacerbates semantic heterogeneity, because semi-structured schemas are much more flexible to start with. For multiple data systems to cooperate with each other, they must understand each other’s schemas.

Natalya Noy - Order from Chaos
There is probably little argument that the past decade has brought the “big bang” in the amount of online information available for processing by humans and machines. Two of the trends that it spurred (among many others) are: first, there has been a move to more flexible and fluid (semi-structured) models than the traditional centralized relational databases that stored most of the electronic data before; second, today there is simply too much information available to be processed by humans, and we really need help from machines.

C. M. Sperberg-McQueen - XML
XML, as defined by the World Wide Web Consortium in 1998, is a method of marking up a document or character stream to identify structural or other units within the data. XML makes several contributions to solving the problem of semi-structured data, the term database theorists use to denote data that exhibits any of the following characteristics:


mamo | Sat, 14 Nov 2009 13:02:31 UTC

I fully apretiate about the discusion you have. but one thing that is ,identification about the advantage and dis advantages of interview. thankyou

Leave this field empty

Post a Comment:

© 2015 ACM, Inc. All Rights Reserved.