You Don’t Know Jack about Disks:
Whatever happened to cylinders and tracks?
Traditionally, the programmer’s working model of disk storage has consisted of a set of uniform cylinders, each with a set of uniform tracks, which in turn hold a fixed number of 512-byte sectors, each with a unique address. The cylinder is made up of concentric circles (or tracks) on each disk platter in a multiplatter drive. Each track is divided up like pie slices into sectors.
Storage Systems: Not Just a Bunch of Disks Anymore:
The sheer size and scope of data available today puts tremendous pressure on storage systems to perform in ways never imagined.
The concept of a storage device has changed dramatically from the first magnetic disk drive introduced by the IBM RAMAC in 1956 to today’s server rooms with detached and fully networked storage servers. Storage has expanded in both large and small directions. All use the same underlying technology but they quickly diverge from there. Here we will focus on the larger storage systems that are typically detached from the server hosts. We will introduce the layers of protocols and translations that occur as bits make their way from the magnetic domains on the disk drives and interfaces to your desktop.
Storage-n Sides to Every Story:
If you ask five different technologists about storage, you better expect five different answers.
The term storage sparks a number of ideas in the minds of storage experts—more so than most other topics in the computing field. Any discussion about storage among people in the industry brings to mind the East Indian legend made famous by English poet John Godfrey Saxe about the blind men observing an elephant. As they all reach out and feel that part of the elephant closest to themselves, they all have a different “view.” Each is correct, and yet none fully so, because no one view can take in the whole picture.
Big Storage: Make or Buy?:
We hear it all the time. The cost of disk space is plummeting.
Your local CompUSA is happy to sell you a 200-gigabyte ATA drive for $300, which comes to about $1,500 per terabyte. Go online and save even more - $1,281 for 1 terabyte of drive space (using, say, 7X Maxtor EIDE 153-GB ATA/113 5400-RPM drives). So why would anyone pay $360,000 to XYZ Storage System Corp. for a 16-terabyte system? I mean, what’s so hard about storage? Good question.
A Conversation with Jim Gray:
Sit down, turn off your cellphone, and prepare to be fascinated.
Clear your schedule, because once you’ve started reading this interview, you won’t be able to put it down until you’ve finished it.
A Conversation with Jeff Bonwick and Bill Moore:
The future of file systems
This month ACM Queue speaks with two Sun engineers who are bringing file systems into the 21st century. Jeff Bonwick, CTO for storage at Sun, led development of the ZFS file system, which is now part of Solaris. Bonwick and his co-lead, Sun Distinguished Engineer Bill Moore, developed ZFS to address many of the problems they saw with current file systems, such as data integrity, scalability, and administration. In our discussion this month, Bonwick and Moore elaborate on these points and what makes ZFS such a big leap forward.
Standardizing Storage Clusters:
Will pNFS become the new standard for parallel data access?
Data-intensive applications such as data mining, movie animation, oil and gas exploration, and weather modeling generate and process huge amounts of data. File-data access throughput is critical for good performance. To scale well, these HPC (high-performance computing) applications distribute their computation among numerous client machines. HPC clusters can range from hundreds to thousands of clients with aggregate I/O demands ranging into the tens of gigabytes per second.
Hard Disk Drives: The Good, the Bad and the Ugly!:
HDDs are like the bread in a peanut butter and jelly sandwich.
HDDs are like the bread in a peanut butter and jelly sandwich—sort of an unexciting piece of hardware necessary to hold the “software.” They are simply a means to an end. HDD reliability, however, has always been a significant weak link, perhaps the weak link, in data storage. In the late 1980s people recognized that HDD reliability was inadequate for large data storage systems so redundancy was added at the system level with some brilliant software algorithms, and RAID (redundant array of inexpensive disks) became a reality. RAID moved the reliability requirements from the HDD itself to the system of data disks.
Storage Virtualization Gets Smart:
The days of overprovisioned, underutilized storage resources might soon become a thing of the past.
Over the past 20 years we have seen the transformation of storage from a dumb resource with fixed reliability, performance, and capacity to a much smarter resource that can actually play a role in how data is managed. In spite of the increasing capabilities of storage systems, however, traditional storage management models have made it hard to leverage these data management capabilities effectively. The net result has been overprovisioning and underutilization. In short, although the promise was that smart shared storage would simplify data management, the reality has been different.
The Emergence of iSCSI:
Modern SCSI, as defined by the SCSI-3 Architecture Model, or SAM, really considers the cable and physical interconnections to storage as only one level in a larger hierarchy.
When most IT pros think of SCSI, images of fat cables with many fragile pins come to mind. Certainly, that’s one manifestation. But modern SCSI, as defined by the SCSI-3 Architecture Model, or SAM, really considers the cable and physical interconnections to storage as only one level in a larger hierarchy. By separating the instructions or commands sent to and from devices from the physical layers and their protocols, you arrive at a more generic approach to storage communication
DAFS: A New High-Performance Networked File System:
This emerging file-access protocol dramatically enhances the flow of data over a network, making life easier in the data center.
The Direct Access File System (DAFS) is a remote file-access protocol designed to take advantage of new high-throughput, low-latency network technology.
BASE: An Acid Alternative:
In partitioned databases, trading some consistency for availability can lead to dramatic improvements in scalability.
Web applications have grown in popularity over the past decade. Whether you are building an application for end users or application developers (i.e., services), your hope is most likely that your application will find broad adoption and with broad adoption will come transactional growth. If your application relies upon persistence, then data storage will probably become your bottleneck.
A Pioneer’s Flash of Insight:
Jim Gray’s vision of flash-based storage anchors this issue’s theme.
In the May/June issue of Queue, Eric Allman wrote a tribute to Jim Gray, mentioning that Queue would be running some of Jim’s best works in the months to come. I’m embarrassed to confess that when this idea was first discussed, I assumed these papers would consist largely of Jim’s seminal work on databases showing only that I (unlike everyone else on the Queue editorial board) never knew Jim. In an attempt to learn more about both his work and Jim himself, I attended the tribute held for him at UC Berkeley in May.
Flash Disk Opportunity for Server Applications:
Future flash-based disks could provide breakthroughs in IOPS, power, reliability, and volumetric capacity when compared with conventional disks.
NAND flash densities have been doubling each year since 1996. Samsung announced that its 32-gigabit NAND flash chips would be available in 2007. This is consistent with Chang-gyu Hwang’s flash memory growth model1 that NAND flash densities will double each year until 2010. Hwang recently extended that 2003 prediction to 2012, suggesting 64 times the current density250 GB per chip. This is hard to credit, but Hwang and Samsung have delivered 16 times since his 2003 article when 2-GB chips were just emerging. So, we should be prepared for the day when a flash drive is a terabyte(!) .
Enterprise SSDs:
Solid-state drives are finally ready for the enterprise. But beware, not all SSDs are created alike.
For designers of enterprise systems, ensuring that hardware performance keeps pace with application demands is a mind-boggling exercise. The most troubling performance challenge is storage I/O. Spinning media, while exceptional in scaling areal density, will unfortunately never keep pace with I/O requirements. The most cost-effective way to break through these storage I/O limitations is by incorporating high-performance SSDs (solid-state drives) into the systems.
CTO Roundtable: Storage Part II:
Leaders in the storage world offer valuable advice for making more effective architecture and technology decisions.
Featuring seven world-class storage experts, this discussion is the first in a new series of CTO Roundtable forums focusing on the near-term challenges and opportunities facing the commercial computing community. Overseen by the ACM Professions Board, this series has as its goal to provide working IT managers with expert advice so they can make better decisions when investing in new architectures and technologies. This is the first installment of the discussion, with a second installment slated for publication in a later issue.
CTO Roundtable: Storage Part II:
Leaders in the storage industry ponder upcoming technologies and trends.
The following conversation is the second installment of a CTO roundtable featuring seven world-class experts on storage technologies. This series of CTO forums focuses on the near-term challenges and opportunities facing the commercial computing community. Overseen by the ACM Professions Board, the goal of the series is to provide IT managers with access to expert advice to help inform their decisions when investing in new architectures and technologies.
GFS: Evolution on Fast-forward:
A discussion between Kirk McKusick and Sean Quinlan about the origin and evolution of the Google File System
During the early stages of development at Google, the initial thinking did not include plans for building a new file system. While work was still being done on one of the earliest versions of the company’s crawl and indexing system, however, it became quite clear to the core engineers that they really had no other choice, and GFS (Google File System) was born.
Triple-Parity RAID and Beyond:
As hard-drive capacities continue to outpace their throughput, the time has come for a new level of RAID.
How much longer will current RAID techniques persevere? The RAID levels were codified in the late 1980s; double-parity RAID, known as RAID-6, is the current standard for high-availability, space-efficient storage. The incredible growth of hard-drive capacities, however, could impose serious limitations on the reliability even of RAID-6 systems. Recent trends in hard drives show that triple-parity RAID must soon become pervasive. In 2005, Scientific American reported on Kryder’s law, which predicts that hard-drive density will double annually. While the rate of doubling has not quite maintained that pace, it has been close.
Keeping Bits Safe: How Hard Can It Be?:
As storage systems grow larger and larger, protecting their data for long-term storage is becoming more and more challenging.
These days, we are all data pack rats. Storage is cheap, so if there’s a chance the data could possibly be useful, we keep it. We know that storage isn’t completely reliable, so we keep backup copies as well. But the more data we keep, and the longer we keep it, the greater the chance that some of it will be unrecoverable when we need it.
File-system Litter:
Cleaning up your storage space quickly and efficiently
Dear KV, We recently ran out of storage space on a very large file server and upon closer inspection we found that it was just one employee who had used it all up. The space was taken up almost exclusively by small files that were the result of running some data-analysis scripts. These files were completely unnecessary after they had been read once. The code that generated the files had no good way of cleaning them up once they had been created; it just went on believing that storage was infinite. Now we’ve had to put quotas on our file servers and, of course, deal with weekly cries for more disk space.
Disks from the Perspective of a File System:
Disks lie. And the controllers that run them are partners in crime.
Most applications do not deal with disks directly, instead storing their data in files in a file system, which protects us from those scoundrel disks. After all, a key task of the file system is to ensure that the file system can always be recovered to a consistent state after an unplanned system crash (for example, a power failure). While a good file system will be able to beat the disks into submission, the required effort can be great and the reduced performance annoying.
Anatomy of a Solid-state Drive:
While the ubiquitous SSD shares many features with the hard-disk drive, under the surface they are completely different.
Over the past several years, a new type of storage device has entered laptops and data centers, fundamentally changing expectations regarding the power, size, and performance dynamics of storage. The SSD (solid-state drive) is a technology that has been around for more than 30 years but remained too expensive for broad adoption.
Crash Consistency:
Rethinking the Fundamental Abstractions of the File System
The reading and writing of data, one of the most fundamental aspects of any Von Neumann computer, is surprisingly subtle and full of nuance. For example, consider access to a shared memory in a system with multiple processors. While a simple and intuitive approach known as strong consistency is easiest for programmers to understand, many weaker models are in widespread use (e.g., x86 total store ordering); such approaches improve system performance, but at the cost of making reasoning about system behavior more complex and error-prone.
Non-volatile Storage:
Implications of the Datacenter’s Shifting Center
For the entire careers of most practicing computer scientists, a fundamental observation has consistently held true: CPUs are significantly more performant and more expensive than I/O devices. The fact that CPUs can process data at extremely high rates, while simultaneously servicing multiple I/O devices, has had a sweeping impact on the design of both hardware and software for systems of all sizes, for pretty much as long as we’ve been building them.
Algorithms Behind Modern Storage Systems:
Different uses for read-optimized B-trees and write-optimized LSM-trees
This article takes a closer look at two storage system design approaches used in a majority of modern databases (read-optimized B-trees and write-optimized LSM (log-structured merge)-trees) and describes their use cases and tradeoffs.
Mind Your State for Your State of Mind:
The interactions between storage and applications can be complex and subtle.
Applications have had an interesting evolution as they have moved into the distributed and scalable world. Similarly, storage and its cousin databases have changed side by side with applications. Many times, the semantics, performance, and failure models of storage and applications do a subtle dance as they change in support of changing business requirements and environmental challenges. Adding scale to the mix has really stirred things up. This article looks at some of these issues and their impact on systems.