Databases

Vol. 3 No. 3 – April 2005

Databases

Articles

A Call to Arms

Long anticipated, the arrival of radically restructured database architectures is now finally at hand.

A Call to Arms

Long anticipated, the arrival of radically restructured database architectures is now finally at hand.

JIM GRAY, MICROSOFT
MARK COMPTON, CONSULTANT

We live in a time of extreme change, much of it precipitated by an avalanche of information that otherwise threatens to swallow us whole. Under the mounting onslaught, our traditional relational database constructs—always cumbersome at best—are now clearly at risk of collapsing altogether.

In fact, rarely do you find a DBMS anymore that doesn’t make provisions for online analytic processing. Decision trees, Bayes nets, clustering, and time-series analysis have also become part of the standard package, with allowances for additional algorithms yet to come. Also, text, temporal, and spatial data access methods have been added—along with associated probabilistic logic, since a growing number of applications call for approximated results. Column stores, which store data column-wise rather than record-wise, have enjoyed a rebirth, mostly to accommodate sparse tables, as well as to optimize bandwidth.

by Jim Gray, Mark Compton

Beyond Relational Databases

There is more to data access than SQL.

Beyond Relational Databases

There is more to data access than SQL.

MARGO SELTZER, SLEEPYCAT

The number and variety of computing devices in the environment are increasing rapidly. Real computers are no longer tethered to desktops or locked in server rooms. PDAs, highly mobile tablet and laptop devices, palmtop computers, and mobile telephony handsets now offer powerful platforms for the delivery of new applications and services. These devices are, however, only the tip of the iceberg. Hidden from sight are the many computing and network elements required to support the infrastructure that makes ubiquitous computing possible.

With so much computing power traveling around in briefcases and pockets, developers are building applications that would have been impossible just a few years ago. Among the interesting services available today are text and multimedia messaging, location-based search and information services (for example, on-demand reviews of nearby restaurants), and ad hoc multiplayer games. Over the next several years, new classes of mobile and personalized services, impossible to predict today, will certainly be developed.

by Margo Seltzer

Databases of Discovery

Open-ended database ecosystems promote new discoveries in biotech. Can they help your organization, too?

Databases of Discovery

Open-ended database ecosystems promote new discoveries in biotech. Can they help your organization, too?

JAMES OSTELL, NCBI

The National Center for Biotechnology Information (NCBI),1 part of the National Institutes of Health (NIH), is responsible for massive amounts of data. A partial list includes the largest public bibliographic database in biomedicine (PubMed),2 the U.S. national DNA sequence database (GenBank),3 an online free full text research article database (PubMed Central),4 assembly, annotation, and distribution of a reference set of genes, genomes, and chromosomes (RefSeq),5 online text search and retrieval systems (Entrez),6 and specialized molecular biology data search engines (BLAST,7 CDD search,8 and others). At this writing, NCBI receives about 50 million Web hits per day, at peak rates of about 1,900 hits per second, and about 400,000 BLAST searches per day from about 2.5 million users. The Web site transfers about 0.6 terabytes per day, and people interested in local copies of bulk data FTP about 1.2 terabytes per day.

In addition to a wide range of different data types and the heavy user load, NCBI must cope with the rapid increase in size of the databases, particularly the sequence databases. GenBank contains 74 billion basepairs of gene sequence and has a doubling time of about 17 months. The Trace Repository (which holds the chromatograms from the sequencing machines for later reanalysis of genome sequence assemblies) contains 0.5 billion chromatograms and is doubling in about 12 months. Finally, because NCBI supplies information resources in molecular biology and molecular genetics, fields in a state of explosive growth and innovation, it must face new classes of data, new relationships among databases and data elements, and new applications many times every year.

by James Ostell

Curmudgeon

File under "Unknowable!"

It's been a hard day's night--proving nonexistence!

File Under “Unknowable!”

It’s been ahard day’s night—proving nonexistence!

Stan Kelly-Bootle

The Yellow Pages used to advertise along the following lines: “If you can’t find it here, it does not exist.” Shannan Hobbes, my favorite epistemologist, and I would ponder this claim well into the wee hours, testing its validity by searching for vendors of “Square Circles,” “Pet Unicorns,” “Cold Fusion,” “The Largest Prime Number,” “Reigning Bald Kings of France,” and similar quiddities oft debated in the PhilTrans (Philosophical Transactions). Our mounting failures—or, to quickly remove the scandalous ambiguity—our growing number of “not founds” amounted to some sort of inductive verification (of which, more anon). The Yellow Pages, considered as a merely finite hierarchy of marketable strings, has nothing much to tell us of contingent, objective existence. Indeed, we searched in vain for many incontrovertibly real, corporeal “thingies.” No sign of the Renaissance crypto-text, Hypnerotomachia Poliphili, which will be on everyone’s lips as soon as the Da Vinci Code mercifully fades from the best-seller lists.1

Madison Avenue (or whatever now serves as the hot-throbbing heart of marketeering) seeks to confuse us on the difference between IF and IFF. Not that the gullible, consuming plumpen need much persuasion! As Marshall McLuhan warned: “The Medium is the Massage.” (Doryphores should note that this is not a typo for Message, but rather McLuhan’s self-mockery, used as a title for his 1967 book coauthored with Quentin Fiore [Penguin]).

by Stan Kelly-Bootle

Interviews

A Conversation with Pat Selinger

Leading the way to manage the world's information

A Conversation with Pat Selinger

Leading the way to manage the world’s information

Take Pat Selinger of IBM and James Hamilton of Microsoft and put them in a conversation together, and you may hear everything you wanted to know about database technology and weren’t afraid to ask.

Selinger, IBM Fellow and vice president of area strategy, information, and interaction for IBM Research, drives the strategy for IBM’s research work spanning the range from classic database systems through text, speech, and multimodal interactions. Since graduating from Harvard with a Ph.D. in applied mathematics, she has spent almost 30 years at IBM, hopscotching between research and development of IBM’s database products.

Kode Vicious

Kode Vicious Battles On

Dear KV, I'm maintaining some C code at work that is driving me right out of my mind. It seems I cannot go more than three lines in any file without coming across a chunk of code that is conditionally compiled.

Kode Vicious Battles On

A koder with attitude, KV answers your questions. Miss Manners he ain’t.

Kode Vicious is at it again, dragging you out of your koding quagmires and kombating the enemies of kommon sense. It sometimes gets ugly down there in the trenches, spelunking the dark caverns of unreadable code and spurious logic, but, hey, somebody’s gotta do it.

Dear KV,

by George Neville-Neil