File Systems

Vol. 7 No. 7 – August 2009

File Systems

Case Study: File Systems and Storage

GFS: Evolution on Fast-forward

A discussion between Kirk McKusick and Sean Quinlan about the origin and evolution of the Google File System

Case Study

GFS: Evolution on Fast-forward

A discussion between Kirk McKusick and Sean Quinlan about the origin and evolution of the Google File System.

During the early stages of development at Google, the initial thinking did not include plans for building a new file system. While work was still being done on one of the earliest versions of the company's crawl and indexing system, however, it became quite clear to the core engineers that they really had no other choice, and GFS (Google File System) was born.

First, given that Google's goal was to build a vast storage network out of inexpensive commodity hardware, it had to be assumed that component failures would be the norm—meaning that constant monitoring, error detection, fault tolerance, and automatic recovery would have to be an integral part of the file system. Also, even by Google's earliest estimates, the system's throughput requirements were going to be daunting by anybody's standards—featuring multi-gigabyte files and data sets containing terabytes of information and millions of objects. Clearly, this meant traditional assumptions about I/O operations and block sizes would have to be revisited. There was also the matter of scalability. This was a file system that would surely need to scale like no other. Of course, back in those earliest days, no one could have possibly imagined just how much scalability would be required. They would learn about that soon enough.

by Marshall Kirk McKusick, Sean Quinlan

Articles

Four Billion Little Brothers?: Privacy, mobile phones, and ubiquitous data collection

Participatory sensing technologies could improve our lives and our communities, but at what cost to our privacy?

Four Billion Little Brothers?

Privacy, mobile phones, and ubiquitous data collection

Participatory sensing technologies could improve our lives and our communities, but at what cost to our privacy?

Katie Shilton, University of California, Los Angeles

They place calls, surf the Internet, and there are close to 4 billion of them in the world. Their built-in microphones, cameras, and location awareness can collect images, sound, and GPS data. Beyond chatting and texting, these features could make phones ubiquitous, familiar tools for quantifying personal patterns and habits. They could also be platforms for thousands to document a neighborhood, gather evidence to make a case, or study mobility and health. This data could help you understand your daily carbon footprint, exposure to air pollution, exercise habits, and frequency of interactions with family and friends.

At the same time, however, this data reveals a lot about your regular locations, habits, and routines. Once such data is captured, acquaintances, friends, or authorities might coerce you to disclose it. Perhaps worse, it could be collected or reused without your knowledge or permission. At the extreme, mobile phones could become the most widespread embedded surveillance tools in history. Imagine carrying a location-aware bug, complete with a camera, accelerometer, and Bluetooth stumbling, everywhere you go. Your phone could document your comings and goings, infer your activities throughout the day, and record whom you pass on the street or who engaged you in conversation. Deployed by governments or compelled by employers, 4 billion "little brothers" could be watching you.

by Katie Shilton

Making Sense of Revision-control Systems

Whether distributed or centralized, all revision-control systems come with complicated sets of tradeoffs. How do you find the best match between tool and team?

Making Sense of Revision-control Systems

Whether distributed or centralized, all revision-control systems come with complicated sets of tradeoffs. How do you find the best match between tool and team?

Bryan O'Sullivan

Modern software is tremendously complicated, and the methods that teams use to manage its development reflect this complexity. Though many organizations use revision-control software to track and manage the complexity of a project as it evolves, the topic of how to make an informed choice of revision-control tools has received scant attention. Until fairly recently, the world of revision control was moribund, so there was simply not much to say on this subject.

The past half-decade, however, has seen an explosion of creativity in revision-control software, and now the leaders of a team are faced with a bewildering array of choices.

by Bryan O'Sullivan

The Meaning of Maintenance

Software maintenance is more than just bug fixes.

The Meaning of Maintenance

Software maintenance is more than just bug fixes.

Dear KV,

Isn't software maintenance a misnomer? I've never heard of anyone reviewing a piece of code every year, just to make sure it was still in good shape. It seems like software maintenance is really just a cover for bug fixing. When I think of maintenance I think of taking my car in for an oil change, not fixing a piece of code. Are there any people who actually review code after it has been running in a production environment?

by George V. Neville-Neil