A discussion with Jesse Robbins, Kripa Krishnan, John Allspaw, and Tom Limoncelli
GAMEDAY EXERCISES CASE STUDY
It’s very nearly the holiday shopping season and something is very wrong at a data center handling transactions for one of the largest online retail operations in the country. Some systems have failed, and no one knows why. Stress levels are off the charts while teams of engineers work around the clock for three days trying to recover.
Automating Software Failure Reporting|
Improving Performance on the Internet
Disks lie. And the controllers that run them are partners in crime.
MARSHALL KIRK MCKUSICK
Most applications do not deal with disks directly, instead storing their data in files in a file system, which protects us from those scoundrel disks. After all, a key task of the file system is to ensure that the file system can always be recovered to a consistent state after an unplanned system crash (for example, a power failure). While a good file system will be able to beat the disks into submission, the required effort can be great and the reduced performance annoying. This article examines the shortcuts that disks take and the hoops that file systems must jump through to get the desired reliability.
Building Systems to Be Shared, Securely
The Five-Minute Rule 20 Years Later: and How Flash Memory Changes the Rules
GFS: Evolution on Fast-forward