January/February 2018 issue of acmqueue

The January/February issue of acmqueue is out now



  Download PDF version of this article PDF

ITEM not available


Originally published in Queue vol. 9, no. 5
see this item in the ACM Digital Library



Graham Cormode - Data Sketching
The approximate approach is often faster and more efficient.

Heinrich Hartmann - Statistics for Engineers
Applying statistical techniques to operations data

Pat Helland - Immutability Changes Everything
We need it, we can afford it, and the time is now.

R. V. Guha, Dan Brickley, Steve MacBeth - Schema.org: Evolution of Structured Data on the Web
Big data makes common schemas even more necessary.


(newest first)

Jered Floyd | Fri, 01 Jul 2011 02:14:46 UTC

The arguments you make are good to keep in mind when integrating deduplication at the block device rather than at the file system. With Albireo, we have experience with integration at both the block-level and the file system level; there are a number of trade-offs when considering at what point to add deduplication.

For this specific case, a sensible approach might be a protocol extension that allows a file system (or application) to indicate blocks that are not to be deduplicated. Much as TRIM allows SSDs to operate more efficiently and reliably, a UNIQUE extension would allow critical file system metadata to be preserved as multiple copies.

Jered Floyd CTO, Permabit Technology Corp.

Bill | Wed, 29 Jun 2011 13:07:36 UTC

I don't see this as an issue as reliability has moved to the controller level. Also the deduplication in an SSD is not the only thing going on, Sandforce controllers also have a RAID like architecture and very good EEC. I think the file systems you quoted are outdated, it may have been important 20 years ago but not now.

Leave this field empty

Post a Comment:

© 2018 ACM, Inc. All Rights Reserved.