Comments

(newest first)

Lukas Eder | Sat, 14 Dec 2013 22:51:40 UTC

This seems more like a plea for LINQ than anything else ;-) One of LINQ's main visions is to unify querying regardless of the data source - and it does so quite well. At the time the author of this article conceived LINQ, he intended to unify querying RDBMS, XML, and objects. "NoSQL" (whatever that is) is a type of data store that was not part of LINQ's original master plan. Thus, LINQ has to be retrofitted / enhanced to accommodate new requirements. It would be all too nice if things were as easy as a simple duality, specifically given the fact that the author of this article has now also created a company called Applied Duality Inc.

But history will teach us where these things go. I currently don't see a second E.F. Codd to solve the complexity introduced with the new abundance of NoSQL data stores - yet.

Will Sisson | Sun, 30 Dec 2012 11:59:47 UTC

I am not convinced that the authors understand what the closed world assumption is.

rdm | Mon, 05 Nov 2012 16:49:05 UTC

p.s. I was thinking specifically of kx.com's implementation of sql when I wrote that comment.

rdm | Thu, 04 Oct 2012 16:50:49 UTC

I think I need to disagree with the statement "all existing SQL-based relational database products are largely indistinguishable".

Specifically, I think that an sql database implementation optimized for time-series analysis, which provides (significant) real-time performance guarantees (such as is used on wall street) is fundamentally different from the products provided by Oracle, IBM, Microsoft, ...  It's true that they have things in common (they're all implementing some kind of ACID model, and there's at least a subset of SQL in common) but the customer requirements are different and the implementation architectures are different.

Kingsley Idehen | Mon, 05 Dec 2011 20:54:18 UTC

Meant to say:
As clearly indicated by Kyle Geiger, 3-tuple based relations are the way forward. By leveraging HTTP URIs in the aforementioned Object, Key, Value pattern, the WWW itself becomes a distributed graph model DBMS equipped with SPARQL and SPASQL (SPARQL inside SQL) as its declarative query languages.

Kingsley Idehen | Mon, 05 Dec 2011 20:53:20 UTC

As clearly indicated by Kyle Geiger, 3-tuple based relations are the way forward. By leveraging HTTP URIs in the aforementioned Object, Key, Value pattern, the WWW itself becomes a distributed graph model DBMS equipped with SPARQL and SPASQL (SPARQL inside SQL).

Dan McCreary | Mon, 05 Dec 2011 16:40:31 UTC

I don't see why XQuery could not be a common language instead of LINQ.  XQuery can also be used to query graphs.  I agree with the others that LINQ would never be adopted by the NoSQL community.  But there are already XQuery front ends for MarkLogic, eXist and not MongoDB.  We do need a "platform" for NoSQL to be successful.  But NoSQL vendors need to understand that mixed content is a big part of what we need and JSON does not support mixed content.

Karl Geiger | Sat, 16 Apr 2011 13:23:31 UTC

Marcelo Cantos and Elf Sternberg are right. It seems that after developing a pointer-based relational schema (!), the authors derive 4th Normal Form (Fig. 8).  Key-value pairs, say {(isbn, title), (isbn,author), (isbn,year), (isbn,pages), (isbn,keywords), (isbn,ratings)}, represent the information more compactly, afford more flexible retrieval, and can be deployed in a fast, distributed system using existing RDBMS technologies. Implement compound keys that depend on the data, eg, "CREATE TABLE titles (isbn VARCHAR(20), title VARCHAR(100), PRIMARY KEY(isbn, title))", to preserve atomicity. 4NF supports the noSQL key-value model in SQL, so SQL is adequate fornoSQL queries. 

Daniel O'Connor hints at the next step: replace sets of key-value relations with a single (object, key, value) relation: [('title', 12345, 'The Right Stuff'), ('author', 12345, 'Tom Wolfe'), ('pages', 12345, '390'), ('keywords', 12345, 'book'), ('keywords', 12345, 'hardcover')]. Distributed RDBMSes can also readily implement a triple store with good performance and known tuning tricks like partitioning and localization.

David Piepgrass | Sat, 09 Apr 2011 19:24:41 UTC

LINQ is a very nice query language, and (in case you missed the point of the article) it could certainly serve as a common query language between SQL and NoSQL/CoSQL. However, in practice it won't serve this purpose as long as it remains Microsoft-only. LINQ cannot take off as a standard query language for CoSQL until we can write LINQ for Java, LINQ for Perl and so on, and with solid Linux support.

It should also be noted that while LINQ can represent both SQL and CoSQL/NoSQL queries, a query designed for SQL tends to look very different from a query designed for an equivalent dataset stored in a CoSQL database. Still, a common language does at least ease the burden of learning many different technologies.

alexis richardson | Fri, 08 Apr 2011 15:47:07 UTC

I don't understand how the dual of a non-compositional algebra can be compositional.  That is, if "dual" has the categorical meaning indicated above.

Ismael C | Fri, 08 Apr 2011 15:44:17 UTC

I cannot help but totally agree with Greg. The whole point of NoSQL is performance, not data representation. We already had a decent AND standard (but far from perfect) way to represent data with SQL. That was not the reason why people started using NoSQL at all.

Marcelo Cantos | Fri, 08 Apr 2011 10:43:14 UTC

Wrt Rick Bullotta's post: The relational model was shown by Codd to be capable of efficiently expressing any kind of knowledge that can be expressed with graphs. In fact, it was his analysis that paved the way for the near-extinction of the network and hierarchical models that prevailed at the time.

The relational model has nothing to do with tables; relations are sets of points in a multidimensional spaces and can easily express all kinds of facts that graph models struggle with (e.g., composite keys and join-relations, especially join-relations with additional non-key attributes). Moreover, graphs only support navigational queries well  in particular, only the kinds of navigation that map onto the predefined structure imposed by the graph. Do members own groups, or do groups own members? In the graph model, it depends on whether you want to find the groups that a member belongs to, or the members that belong to a group, and heaven help you if you need to do both. In the relational model, the question is meaningless, and both kinds of query are easily expressed and optimised, as well as even nastier stuff like, "Who shares more than two groups?" What kind of graph model would make that kind of query efficient, and how would you express it in a graph-oriented algebra?

Perhaps you are confusing SQL with the relational model.

Elf M. Sternberg | Fri, 08 Apr 2011 03:17:26 UTC

Greg: I think you're missing the counter-point of the article: there are a number of RDBMS systems, such is MySQL (which has multiple storage engines), Oracle, MS SQL Server, etc.  The underlying technologies for many of these are known to be wildly different, but they all still use SQL as the language of choice for changing the database.  These products compete on their SQL'99 completeness, their performance and resilience.

This paper points to an underlying commonality among the NoSQL "document storage" databases, and therefore the possibility of a common language, the "coSQL" of the paper.   If we agreed on a common tongue for all of these (in much the same way as SQL, or OpenCL for GPU programming, or C for systems development), then users could choose between them without the vendor lock-in and associated inertia of switching.

Daniel O'Connor | Fri, 08 Apr 2011 01:05:58 UTC

SPARQL already exists as a standard, and solves a large number of the traditional SQL problems regarding object/attribute/value storage. How does this differ

Gary Myers | Mon, 28 Mar 2011 04:19:19 UTC

One point that is missed is that while SQL dominates the relational database market today, there were competing languages, such as Ingres' QUEL, in its early days. It isn't sufficient for noSQL/coSQL to come up with a mathematical model to address incompatibilities between implementations.

Barry Kelly | Wed, 23 Mar 2011 20:34:24 UTC

I think you have that monopolistic competition concept exactly the wrong way around. Perfect competition is what kills economic profit. Monopolistic competition may devolve into perfect competition (though almost no market is in perfect competition, unsubstitutable things like trademarks and branding etc. being what they are); or it may devolve into natural monopolies, in which case the monopoly may extract large monopolistic rents as profit.

André Torkveen | Tue, 22 Mar 2011 13:17:58 UTC

Jon Udell tweeted about this article, then Kingsley Idehen (@kidehen) responded with a few follow-up tweets. I suggest everyone reading this far also read his comments in twitterspace, as well as the recommended "An Overview of Data Management Paradigms: Relational, Document, and Graph" presentation by Marko A. Rodriguez (http://www.slideshare.net/slidarko/an-overview-of-data-management-paradigms-relational-document-and-graph-3880059). Thanks for the combined enlightenment, everyone!

Peter Neubauer | Tue, 22 Mar 2011 12:58:42 UTC

Hi there,

At least for Graph Databases and the Property Graph Model, there are
the beginnings of a proper algebra along the lines of the relational
algebra:

"A Path Algebra for Multi-Relational Graphs"

The link to the article is: http://bit.ly/9yyQqU

peter

John Bailo | Tue, 22 Mar 2011 06:24:05 UTC

Good article.  I proposed something like this in VB, using OOP encapsulation as the paradigm for heterogeneous data structures.

Encapsulation in the Databases World (2002) by John Bailo

http://devcity.net/Articles/19/1/20020301.aspx

orcmid | Mon, 21 Mar 2011 23:03:25 UTC

I think it is unfortunate that Cliff Notes is considered an appropriate resource for understanding of monopolistic competition.  Wikipedia provides more nuance.  As far as I can tell, no one is striving for perfect competition among database products of any flavor, and generally for good reason, no matter that they may be doomed to failure at achieving sustainable monopoly profits.

Finally, I can't imagine Ted Codd lying quietly in his grave with SQL lying at his feet.

Ignoring those preliminaries, it strikes me that a demonstration of some sort of model equivalence is interesting although that is not enough, as other commenters have observed, to establish performance equivalence for the same use cases.

adamo | Mon, 21 Mar 2011 21:02:54 UTC

The article answers my favorite question to NoSQL fans: "Where is the math?". So now my question is: Does it cover every NoSQL (even the ones not invented yet) or just a certain subset of them? And although the formal categorization of SQL and NoSQL as categories is outside the scope of this article, is there a TR that describes the effort?

Rick Bullotta | Mon, 21 Mar 2011 20:32:14 UTC

The article really doesn't address graph database models (e.g. Neo4J), which allow certain constructs that are virtually impossible to represent in relational/tabular DBMS's, or at best, not efficiently.  Interestingly, many of these constructs are quite common in the "real world" - whether representing the physical world and its connections/complexities, software code, business processes, and so on.  Additionally, the "query" models for these types of constructs are dramatically different, with concepts such as traversals.

Greg Linden | Mon, 21 Mar 2011 16:34:05 UTC

I think this article misses the point of NoSQL.  To my knowledge, no one argues that the advantage of NoSQL is that it can represent operations that cannot be represented in SQL or visa-versa.  Rather, the point of NoSQL is that it does certain common and desirable operations reliably and efficiently at very large scale.

Sign up for QueueNews

Upcoming Conferences

acmqueue app

Join ACM

Comments