September/October 2018 issue of acmqueue The September/October issue of acmqueue is out now

Subscribers and ACM Professional members login here



File Systems and Storage

  Download PDF version of this article PDF

Error 526 Ray ID: 47bd600f9f33c5ca • 2018-11-18 21:01:57 UTC

Invalid SSL certificate

You

Browser

Working
Newark

Cloudflare

Working
deliverybot.acm.org

Host

Error

What happened?

The origin web server does not have a valid SSL certificate.

What can I do?

If you're a visitor of this website:

Please try again in a few minutes.

If you're the owner of this website:

The SSL certificate presented by the server did not pass validation. This could indicate an expired SSL certificate or a certificate that does not include the requested domain name. Please contact your hosting provider to ensure that an up-to-date and valid SSL certificate issued by a Certificate Authority is configured for this domain name on the origin server. Additional troubleshooting information here.

acmqueue

Originally published in Queue vol. 8, no. 10
see this item in the ACM Digital Library


Tweet


Related:

Pat Helland - Mind Your State for Your State of Mind
The interactions between storage and applications can be complex and subtle.


Alex Petrov - Algorithms Behind Modern Storage Systems
Different uses for read-optimized B-trees and write-optimized LSM-trees


Mihir Nanavati, Malte Schwarzkopf, Jake Wires, Andrew Warfield - Non-volatile Storage
Implications of the Datacenter's Shifting Center


Thanumalayan Sankaranarayana Pillai, Vijay Chidambaram, Ramnatthan Alagappan, Samer Al-Kiswany, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau - Crash Consistency
Rethinking the Fundamental Abstractions of the File System



Comments

(newest first)

Breno Gomes | Tue, 01 Feb 2011 12:07:34 UTC

The article is excellent. I strongly agree the correction reinforces the overall case. I would raise a related topic, hopefully leading to another studies: the industry states that DVD should last 50 to 100 years. It is a fraction of Sun's claim for the ST5800. Keeping the data safe is a challenge on its own right. Some file formats such as plain text files are relatively safe from applications' natural evolution. However there are vast amounts of information saved by audio and video recorders, word processors, spreadsheets, databases and so on. Most applications have limited backward compatibility. There is a real risk the future generations will inherit data nobody can decode, invalidating all efforts to keep the information safe.


David Rosenthal | Wed, 20 Oct 2010 14:14:47 UTC

Thank you for pointing out that my statistical mistakes are only slightly less than those of the manufacturers! Fortunately, the correction seems to reinforce my overall case. And I agree that the failure probability looks strangely close to "one in a million".


The Soltan of Afganistan | Wed, 13 Oct 2010 00:31:15 UTC

This is a great article and brings up many good points, however the 'math' in the probabilities section is horribly wrong.

As a statistician I am always sad to see statements of this nature:

Sirius watched the entire production of SC5800s ($10^10 worth of storage systems) over their entire service life, the experiment would end 20 years from now after accumulating about 2×10^6 system-years of data. If its claim were correct, Sirius would have about a 17 percent chance of seeing a single data-loss event

The random variable here is the number of data-loss events among all of the systems in 10 years. First lets see what the probability of the failure of one machine in 10 years is. I am using the horribly decided normal distribution (Failure rates are usually modeled with Poisson distribution).

P(Failure in 1 machine within 10 years) = P(Z(2.4e6,0.4e6) < 10) = 9.867e-10

This rate is ridiculously small about 1 in 1 million. It seems like they just made it up from the phrase "one in a million"...

now if we take all 2e5 machines, each one is an independent Bernoulli trial.

So the x of them fail is distributed Binomial(200000,9.867e-10)

P(No machine fails) = P(0 failures) = nCr(200000,0) * 9.867e-10^0 * (1 - 9.867e-10)^200000 = 0.9998

P(at least one machine fails) = 1 - P(no machine fails) = 0.000197

This is much smaller than 17 percent, but then again that 17 percent assumed we could just add up all the machine years.


David Rosenthal | Wed, 06 Oct 2010 10:08:12 UTC

I missed an excellent paper from the Usenix HotStorage 2010 workshop, "Mean time to meaningless: MTTDL, Markov models, and storage system reliability" by Kevin Greenan, James Plank and Jay Wylie.

They agree with my point that MTTDL is a meaningless measure of storage reliability, and that bit half-life isn't a great improvement on it. They propose instead NOMDL (NOrmalized Magnitude of Data Loss), i.e. the expected number of bytes that the storage will lose in a specified interval divided by its usable capacity. As they point out, it is possible to compute this using Monte Carlo simulation based on distributions of component failures that experiments have shown to fit the real world. These simulations produce estimates that are relatively credible, especially compared to the ludicrous MTTDL estimates I pillory in the article.

NOMDL is a far better measure than MTTDL. Greenan, Plank and Wylie are to be congratulated for proposing it.


Leave this field empty

Post a Comment:







© 2018 ACM, Inc. All Rights Reserved.