Download PDF version of this article PDF

Unstructured, But Not Really

Charlene O’Hanlon, ACM Queue

Mention the term semi-structured data and chances are you’ll be met with strong opinions from one of two camps: those who believe semi-structured data is nothing more than a fancy term for a data structure left unfinished, and others who firmly believe semi-structured data is the best way to describe data that doesn’t easily fit into the traditional database structure.

Semi-structured data, because of its unstructured-yet-structured nature, presents its own set of problems, such as schema discovery and determining the proper method to perform essential database operations such as extraction, integration, translation, and storage of data.

Fortunately, there has been much research and testing to make semi-structured data a better neighbor with its traditional database counterparts. In her article “Managing Semi-Structured Data,” Queue guest expert Daniela Florescu of Oracle takes a look at the difficulties in dealing with semi-structured data, as well as the progress made in discovering new and improved ways of harnessing the information.

Adam Bosworth of Google presents his view on what the Web has taught us in terms of working with data, either structured or unstructured, in his article “Learning From the Web.” XML, for all its simplicity, still has some fundamental weaknesses that prevent it from effectively addressing core database issues. But, he contends, help is on the way from syndication formats such as RSS and ATOM.

C.M. Sperberg-McQueen of the World Wide Web Consortium addresses XML’s identity crisis in his article “XML and Semi-Structured Data.” Indeed, XML data can be structured or flexible depending on the application—which, Sperberg-McQueen contends, proves that the problems of semi-structured data have more to do with the relational models than the data.

Natalya Noy of Stanford University delves into the ontologies issue surrounding semi-structured data in her article “Order from Chaos.” Is it possible that ontologies will be the cure-all for knitting together various data sets, as envisioned with the Semantic Web? Or are ontologies merely a Band-Aid solution to a much larger problem? Noy presents a persuasive argument that ontologies are at least a step in the right direction.

Rounding out the group is the University of Washington’s Alon Halevy, whose article “Why Your Data Won’t Mix” addresses the cog-in-the-wheel problem of semi-structured data and semantic heterogeneity, as well as the potential solutions and opportunities in making all data play nicely regardless of schema. “The problem of reconciling schema heterogeneity has been a subject of research for decades, but solutions are few,” Halevy writes.

Compelling topics, one and all. But Chris Suver of Microsoft takes a different view in this month’s Curmudgeon column, countering that semi-structured data is merely data that has been only partially structured for reasons of efficiency or economics and as such should not be treated as the next big thing. It is incorrect to assume, because the data is semi-structured, that it is intrinsically different, Suver says.

As you can see, this issue is chock-full of information that we hope you will find insightful and useful. We welcome your thoughts and comments about this issue of Queue, as well as topics you’d like to see us cover in future issues.

On a personal note, I’d like to introduce myself as the new editor and publisher of Queue. I joined the ranks of ACM in July after a five-year stint as managing editor at CMP Media, which is known for its technologically savvy newsweeklies and monthlies. Manning the helm at Queue will be both an educational and a rewarding experience, and I’m looking forward to the challenge. Please send me a message at [email protected] I’ll do my best to answer every message I receive.

Happy reading!

CHARLENE O’HANLON is Queue’s new editor and publisher. She most recently was managing editor at CMP Media and knows her way around the business aspects of technology. She collects good (clean) jokes and doesn’t go anywhere without her Sidekick.


Originally published in Queue vol. 3, no. 8
see this item in the ACM Digital Library


© ACM, Inc. All Rights Reserved.