Download PDF version of this article PDF

Storage—n Sides to Every Story
Randy Harr, Queue Advisory Board Member

The term storage sparks a number of ideas in the minds of storage experts—more so than most other topics in the computing field. Any discussion about storage among people in the industry brings to mind the East Indian legend made famous by English poet John Godfrey Saxe about the blind men observing an elephant. As they all reach out and feel that part of the elephant closest to themselves, they all have a different “view.” Each is correct, and yet none fully so, because no one view can take in the whole picture.

So it is with computer storage. This diversity of points of view derives not only from the physical objects we can see in front of us (e.g., disk drives, nasty old SCSI cables, etc.) but also from the abstract layers that make up the control, configuration, management, and architecture of storage systems.

With all these different points of view, how could we hope to bring to you a single comprehensive issue on storage that would adequately cover this vast space of information? The short answer is that we cannot. The result would have been a tome so large that it would be more akin to a book than ACM Queue magazine. Still, we felt that with some patient work we could carve out a piece from the massive storage universe that would deliver some of the compelling topics of the day.

Let me explain how we narrowed our discussion, and why. First, for the sake of clarity of focus, we deal here with magnetic disk storage only, leaving optical, tape, and semiconductor storage for a future issue. Second, true to Queue’s mission, we focus on storage from a software architect’s perspective, thus removing discussion on physical technology trends in storage media and interconnect. Finally, we look for the key technologies that are changing the fastest and are most likely to have the biggest impact on you in the near future—thus avoiding in-depth discussions on long-existing protocols such as SCSI or Fibre Channel.

Our issue starts out with an article about the changing face of disk drives as told by Dave Anderson of Seagate Technology. What he says about how disk access is abstracted and optimized might surprise you. To understand why the ever-increasing availability and size of storage and disks never seems to be enough, we’ve included with Anderson’s article a short summary of the now famous “How Much Data” research done by Peter Lyman and Hal Varian of the University of California at Berkeley’s School for Information Management and Science.

Next up, Erik Riedel of Seagate Research explains how networked storage is built up from traditional direct-attached storage. He is followed by Jeff Goldner of Microsoft’s iSCSI group, who brings us the pros and cons of iSCSI, a new entrant into the area of storage systems.

Since the days of Multics and its follow-on project, Unix, file systems have been the most visible way we view storage—both as users, application writers, and information technology managers. Steve Kleiman of Network Appliance offers a view into evolving file systems with a position on the new Direct Access File System (DAFS).

Perhaps the piece de resistance is an in-depth interview of storage-industry legend Jim Gray of Microsoft Research, conducted by another storage technology veteran, Dave Patterson of U.C. Berkeley. We also include an opinion piece by Josh Coates of Scale8, reminding us why we should (or shouldn’t, as the case may be) drop a boatload of cash on those sophisticated storage systems.

I’d like to make a special mention of gratitude to Clint Jurgens and Jim Gray for their help with this issue. Without their assistance along the way—reading, editing, listening—this issue simply would not have been possible. Read and enjoy.

RANDY HARR, adVenture Planner, RED Associates, was most recently co-founder of Intransa, a still stealth-mode storage-networking startup. As its vice president of engineering and chief architect, he developed an entry-level product with five years of architecture expansions. Prior to Intransa, Harr spent five years with Synopsys’ Advanced Technology Group where he conceived and directed advanced product developments. While at Synopsys, he was tapped for a three-year appointment to the Microsystems Technology Office (MTO) of the Defense Advanced Research Projects Agency (DARPA).

acmqueue

Originally published in Queue vol. 1, no. 4
Comment on this article in the ACM Digital Library





More related articles:

Pat Helland - Mind Your State for Your State of Mind
Applications have had an interesting evolution as they have moved into the distributed and scalable world. Similarly, storage and its cousin databases have changed side by side with applications. Many times, the semantics, performance, and failure models of storage and applications do a subtle dance as they change in support of changing business requirements and environmental challenges. Adding scale to the mix has really stirred things up. This article looks at some of these issues and their impact on systems.


Alex Petrov - Algorithms Behind Modern Storage Systems
This article takes a closer look at two storage system design approaches used in a majority of modern databases (read-optimized B-trees and write-optimized LSM (log-structured merge)-trees) and describes their use cases and tradeoffs.


Mihir Nanavati, Malte Schwarzkopf, Jake Wires, Andrew Warfield - Non-volatile Storage
For the entire careers of most practicing computer scientists, a fundamental observation has consistently held true: CPUs are significantly more performant and more expensive than I/O devices. The fact that CPUs can process data at extremely high rates, while simultaneously servicing multiple I/O devices, has had a sweeping impact on the design of both hardware and software for systems of all sizes, for pretty much as long as we’ve been building them.


Thanumalayan Sankaranarayana Pillai, Vijay Chidambaram, Ramnatthan Alagappan, Samer Al-Kiswany, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau - Crash Consistency
The reading and writing of data, one of the most fundamental aspects of any Von Neumann computer, is surprisingly subtle and full of nuance. For example, consider access to a shared memory in a system with multiple processors. While a simple and intuitive approach known as strong consistency is easiest for programmers to understand, many weaker models are in widespread use (e.g., x86 total store ordering); such approaches improve system performance, but at the cost of making reasoning about system behavior more complex and error-prone.





© ACM, Inc. All Rights Reserved.