Source blog: Adam Leventhal's Weblog
Leaving Oracle
I joined the Solaris Kernel Group in 2001 at what turned out to be a remarkable place and time for the industry. More by luck and intuition than by premonition, I found myself surrounded by superlative engineers working on revolutionary technologies that were the products of their own experience and imagination rather than managerial fiat. I feel very lucky to have worked with Bryan and Mike on DTrace; it was amazing that just down the hall our colleagues reinvented the operating system with Zones, ZFS, FMA, SMF and other innovations. With Solaris 10 behind us, lauded by customers and pundits, I was looking for that next remarkable place and time, and found it with Fishworks.
Leaving Oracle
I joined the Solaris Kernel Group in 2001 at what turned out to be a remarkable place and time for the industry. More by luck and intuition than by premonition, I found myself surrounded by superlative engineers working on revolutionary technologies that were the products of their own experience and imagination rather than managerial fiat. I feel very lucky to have worked with Bryan and Mike on DTrace; it was amazing that just down the hall our colleagues reinvented the operating system with Zones, ZFS, FMA, SMF and other innovations. With Solaris 10 behind us, lauded by customers and pundits, I was looking for that next remarkable place and time, and found it with Fishworks.
Leaving Oracle
I joined the Solaris Kernel Group in 2001 at what turned out to be a remarkable place and time for the industry. More by luck and intuition than by premonition, I found myself surrounded by superlative engineers working on revolutionary technologies that were the products of their own experience and imagination rather than managerial fiat. I feel very lucky to have worked with Bryan and Mike on DTrace; it was amazing that just down the hall our colleagues reinvented the operating system with Zones, ZFS, FMA, SMF and other innovations. With Solaris 10 behind us, lauded by customers and pundits, I was looking for that next remarkable place and time, and found it with Fishworks.
Fishworks history of SSDs
This year's flash memory summit got me thinking about our use of SSDs over the years at Fishworks. The picture of our left is a visual history of SSD evals in rough chronological order from the oldest at the bottom to the newest at the top (including some that have yet to see the light of day). Early Days When we started Fishworks, we were inspired by the possibilities presented by ZFS and Thumper. Those components would be key building blocks in the enterprise storage solution that became the 7000 series. An immediate deficiency we needed to address was how to deliver competitive performance using 7,200 RPM disks.
Fishworks history of SSDs
This year's flash memory summit got me thinking about our use of SSDs over the years at Fishworks. The picture of our left is a visual history of SSD evals in rough chronological order from the oldest at the bottom to the newest at the top (including some that have yet to see the light of day). Early Days When we started Fishworks, we were inspired by the possibilities presented by ZFS and Thumper. Those components would be key building blocks in the enterprise storage solution that became the 7000 series. An immediate deficiency we needed to address was how to deliver competitive performance using 7,200 RPM disks.
Fishworks history of SSDs
This year's flash memory summit got me thinking about our use of SSDs over the years at Fishworks. The picture of our left is a visual history of SSD evals in rough chronological order from the oldest at the bottom to the newest at the top (including some that have yet to see the light of day). Early Days When we started Fishworks, we were inspired by the possibilities presented by ZFS and Thumper. Those components would be key building blocks in the enterprise storage solution that became the 7000 series. An immediate deficiency we needed to address was how to deliver competitive performance using 7,200 RPM disks.
Farewell to Bryan Cantrill
Bryan Cantrill, VP of Engineering at Joyent, earning $15. I've been expecting this automated mail for a while now, but it was disheartening nonetheless: List: dtrace-discuss Member: [email protected] Action: Subscription disabled. Reason: Excessive or fatal bounces. As one of the moderators of the DTrace discussion list, I see people subscribe and unsubscribe. Bryan has, of course, left Oracle and joined Joyent to be their VP of engineering. Bryan is a terrific engineer, and I count myself lucky to have worked with him for the past nine years first on DTrace and then on Fishworks.
Farewell to Bryan Cantrill
Bryan Cantrill, VP of Engineering at Joyent, earning $15. I've been expecting this automated mail for a while now, but it was disheartening nonetheless: List: dtrace-discuss Member: [email protected] Action: Subscription disabled. Reason: Excessive or fatal bounces. As one of the moderators of the DTrace discussion list, I see people subscribe and unsubscribe. Bryan has, of course, left Oracle and joined Joyent to be their VP of engineering. Bryan is a terrific engineer, and I count myself lucky to have worked with him for the past nine years first on DTrace and then on Fishworks.
Farewell to Bryan Cantrill
Bryan Cantrill, VP of Engineering at Joyent, earning $15. I've been expecting this automated mail for a while now, but it was disheartening nonetheless: List: dtrace-discuss Member: [email protected] Action: Subscription disabled. Reason: Excessive or fatal bounces. As one of the moderators of the DTrace discussion list, I see people subscribe and unsubscribe. Bryan has, of course, left Oracle and joined Joyent to be their VP of engineering. Bryan is a terrific engineer, and I count myself lucky to have worked with him for the past nine years first on DTrace and then on Fishworks.
What is RAID-Z?
The mission of ZFS was to simply storage and to construct an enterprise level of quality from volume components by building smarter software — indeed that notion is at the heart of the 7000 series. An important piece of that puzzle was eliminating the expensive RAID card used in traditional storage and replacing it with high performance, software RAID. To that end, Jeff invented RAID-Z; it's key innovation over other software RAID techniques was to close the "RAID-5 write hole" by using variable width stripes. RAID-Z, however, is definitely not RAID-5 despite that being the most common comparison. RAID levels Last year I wrote about the need for triple-parity RAID, and in that article I summarized the various RAID levels as enumerated by Gibson, Katz, and Patterson, along with Peter Chen, Edward Lee, and myself: RAID-0 Data is striped across devices for maximal write performance.
What is RAID-Z?
The mission of ZFS was to simplify storage and to construct an enterprise level of quality from volume components by building smarter software — indeed that notion is at the heart of the 7000 series. An important piece of that puzzle was eliminating the expensive RAID card used in traditional storage and replacing it with high performance, software RAID. To that end, Jeff invented RAID-Z; it's key innovation over other software RAID techniques was to close the "RAID-5 write hole" by using variable width stripes. RAID-Z, however, is definitely not RAID-5 despite that being the most common comparison. RAID levels Last year I wrote about the need for triple-parity RAID, and in that article I summarized the various RAID levels as enumerated by Gibson, Katz, and Patterson, along with Peter Chen, Edward Lee, and myself: RAID-0 Data is striped across devices for maximal write performance.
What is RAID-Z?
The mission of ZFS was to simplify storage and to construct an enterprise level of quality from volume components by building smarter software — indeed that notion is at the heart of the 7000 series. An important piece of that puzzle was eliminating the expensive RAID card used in traditional storage and replacing it with high performance, software RAID. To that end, Jeff invented RAID-Z; it's key innovation over other software RAID techniques was to close the "RAID-5 write hole" by using variable width stripes. RAID-Z, however, is definitely not RAID-5 despite that being the most common comparison. RAID levels Last year I wrote about the need for triple-parity RAID, and in that article I summarized the various RAID levels as enumerated by Gibson, Katz, and Patterson, along with Peter Chen, Edward Lee, and myself: RAID-0 Data is striped across devices for maximal write performance.
A Logzilla for your ZFS box
A key component of the ZFS Hybrid Storage Pool is Logzilla, a very fast device to accelerate synchronous writes. This component hides the write latency of disks to enable the use of economical, high-capacity drives. In the Sun Storage 7000 series, we use some very fast SAS and SATA SSDs from STEC as our Logzilla &mdash the devices are great and STEC continues to be a terrific partner. The most important attribute of a good Logzilla device is that it have very low latency for sequential, uncached writes. The STEC part gives us about 100μs latency for a 4KB write — much much lower than most SSDs.
A Logzilla for your ZFS box
A key component of the ZFS Hybrid Storage Pool is Logzilla, a very fast device to accelerate synchronous writes. This component hides the write latency of disks to enable the use of economical, high-capacity drives. In the Sun Storage 7000 series, we use some very fast SAS and SATA SSDs from STEC as our Logzilla &mdash the devices are great and STEC continues to be a terrific partner. The most important attribute of a good Logzilla device is that it have very low latency for sequential, uncached writes. The STEC part gives us about 100μs latency for a 4KB write — much much lower than most SSDs.
A Logzilla for your ZFS box
A key component of the ZFS Hybrid Storage Pool is Logzilla, a very fast device to accelerate synchronous writes. This component hides the write latency of disks to enable the use of economical, high-capacity drives. In the Sun Storage 7000 series, we use some very fast SAS and SATA SSDs from STEC as our Logzilla &mdash the devices are great and STEC continues to be a terrific partner. The most important attribute of a good Logzilla device is that it have very low latency for sequential, uncached writes. The STEC part gives us about 100μs latency for a 4KB write — much much lower than most SSDs.
2010.Q1 simulator
On the heels of the 2010.Q1 software release, we've provided a new version of the Sun Storage 7000 simulator that can be found at this new location. As noted previously, the simulator is a terrific way to take the Sun Storage 7000 user interface for a spin; it includes the exact same software as a physical 7000 series system with the same features. The last release of the simulator added support for VirtualBox; this release now removes support for VMware. VMware was tremendously useful for our initial release, but VirtualBox has addressed the problems that initially excluded it, and the maintenance burden of supporting two virtual platforms has led us to drop VMware in this release.
2010.Q1 simulator
On the heels of the 2010.Q1 software release, we've provided a new version of the Sun Storage 7000 simulator that can be found at this new location. As noted previously, the simulator is a terrific way to take the Sun Storage 7000 user interface for a spin; it includes the exact same software as a physical 7000 series system with the same features. The last release of the simulator added support for VirtualBox; this release now removes support for VMware. VMware was tremendously useful for our initial release, but VirtualBox has addressed the problems that initially excluded it, and the maintenance burden of supporting two virtual platforms has led us to drop VMware in this release.
2010.Q1 simulator
On the heels of the 2010.Q1 software release, we've provided a new version of the Sun Storage 7000 simulator that can be found at this new location. As noted previously, the simulator is a terrific way to take the Sun Storage 7000 user interface for a spin; it includes the exact same software as a physical 7000 series system with the same features. The last release of the simulator added support for VirtualBox; this release now removes support for VMware. VMware was tremendously useful for our initial release, but VirtualBox has addressed the problems that initially excluded it, and the maintenance burden of supporting two virtual platforms has led us to drop VMware in this release.
The need for triple-parity RAID
When I first wrote about triple-parity RAID in ZFS and the Sun Storage 7000 series, I alluded a looming requirement for triple-parity RAID due to a growing disparity between disk capacity and throughput. I've written an article in ACM Queue examining this phenomenon in detail, and making the case for triple-parity RAID. Dominic Kay helped me sift through hard drive data for the past ten years to build a model for how long it takes to fully populate a drive. I've reproduced a graph here from the paper than displays the timing data for a few common drive types — the trends are obviously quite clear.
The need for triple-parity RAID
When I first wrote about triple-parity RAID in ZFS and the Sun Storage 7000 series, I alluded a looming requirement for triple-parity RAID due to a growing disparity between disk capacity and throughput. I've written an article in ACM Queue examining this phenomenon in detail, and making the case for triple-parity RAID. Dominic Kay helped me sift through hard drive data for the past ten years to build a model for how long it takes to fully populate a drive. I've reproduced a graph here from the paper than displays the timing data for a few common drive types — the trends are obviously quite clear.
The need for triple-parity RAID
When I first wrote about triple-parity RAID in ZFS and the Sun Storage 7000 series, I alluded a looming requirement for triple-parity RAID due to a growing disparity between disk capacity and throughput. I've written an article in ACM Queue examining this phenomenon in detail, and making the case for triple-parity RAID. Dominic Kay helped me sift through hard drive data for the past ten years to build a model for how long it takes to fully populate a drive. I've reproduced a graph here from the paper than displays the timing data for a few common drive types — the trends are obviously quite clear.
Logzillas: to mirror or stripe?
The Hybrid Storage Pool integrates flash into the storage hierarchy in two specific ways: as a massive read cache and as fast log devices. For read cache devices, Readzillas, there's no need for redundant configurations; it's a clean cache so the data necessarily also resides on disk. For log devices, Logzillas, redundancy is essential, but how that translates to their configuration can be complicated. How to decide whether to stripe or mirror? ZFS intent log devices Logzillas are used as ZFS intent log devices (slogs in ZFS jargon). For certain synchronous write operations, data is written to the Logzilla so the operation can be acknowledged to the client quickly before the data is later streamed out to disk.
Logzillas: to mirror or stripe?
The Hybrid Storage Pool integrates flash into the storage hierarchy in two specific ways: as a massive read cache and as fast log devices. For read cache devices, Readzillas, there's no need for redundant configurations; it's a clean cache so the data necessarily also resides on disk. For log devices, Logzillas, redundancy is essential, but how that translates to their configuration can be complicated. How to decide whether to stripe or mirror? ZFS intent log devices Logzillas are used as ZFS intent log devices (slogs in ZFS jargon). For certain synchronous write operations, data is written to the Logzilla so the operation can be acknowledged to the client quickly before the data is later streamed out to disk.
Logzillas: to mirror or stripe?
The Hybrid Storage Pool integrates flash into the storage hierarchy in two specific ways: as a massive read cache and as fast log devices. For read cache devices, Readzillas, there's no need for redundant configurations; it's a clean cache so the data necessarily also resides on disk. For log devices, Logzillas, redundancy is essential, but how that translates to their configuration can be complicated. How to decide whether to stripe or mirror? ZFS intent log devices Logzillas are used as ZFS intent log devices (slogs in ZFS jargon). For certain synchronous write operations, data is written to the Logzilla so the operation can be acknowledged to the client quickly before the data is later streamed out to disk.
2009.Q3 Storage Configuration
Today we shipped our 2009.Q3 release. Amidst the many great new features, enhancements and bug fixes, we've added new storage profiles for triple-parity RAID and three-way mirroring. Here's an example on a 9 JBOD system of what you'll see in the updated storage configuration screen: Note that the new Triple parity RAID, wide stripes option replaces the old Double parity RAID, wide stripes configuration. With RAID stripes that can easily be more than 40 disks wide, and resilver times that can be quite long as a result, we decided that the additional protection of triple-parity RAID trumped the very small space efficiency of double-parity RAID.
2009.Q3 Storage Configuration
Today we shipped our 2009.Q3 release. Amidst the many great new features, enhancements and bug fixes, we've added new storage profiles for triple-parity RAID and three-way mirroring. Here's an example on a 9 JBOD system of what you'll see in the updated storage configuration screen: Note that the new Triple parity RAID, wide stripes option replaces the old Double parity RAID, wide stripes configuration. With RAID stripes that can easily be more than 40 disks wide, and resilver times that can be quite long as a result, we decided that the additional protection of triple-parity RAID trumped the very small space efficiency of double-parity RAID.
2009.Q3 Storage Configuration
Today we shipped our 2009.Q3 release. Amidst the many great new features, enhancements and bug fixes, we've added new storage profiles for triple-parity RAID and three-way mirroring. Here's an example on a 9 JBOD system of what you'll see in the updated storage configuration screen: Note that the new Triple parity RAID, wide stripes option replaces the old Double parity RAID, wide stripes configuration. With RAID stripes that can easily be more than 40 disks wide, and resilver times that can be quite long as a result, we decided that the additional protection of triple-parity RAID trumped the very small space efficiency of double-parity RAID.
Flash Memory Summit 2009
At the Flash Memory Summit today, Sun's own Michael Cornwell delivered a keynote excoriating the overall direction of NAND flash and SSDs. In particular, he spoke of the "lithography death march" as NAND vendors push to deliver the most cost-efficient solution while making huge sacrifices in reliability and performance. On Wednesday, August 12, I'll be giving two short talks as part of sessions on flash-enabled power savings and data center applications: 8:30 - 9:45 Power Saving Architectures Enabled by Smarter Software 3:15 - 4:30 The Need For Higher-Level Software for Flash In the evening from 7:30 to 9:00, I'll be hosting a table discussion of software as it pertains to flash.
Flash Memory Summit 2009
At the Flash Memory Summit today, Sun's own Michael Cornwell delivered a keynote excoriating the overall direction of NAND flash and SSDs. In particular, he spoke of the "lithography death march" as NAND vendors push to deliver the most cost-efficient solution while making huge sacrifices in reliability and performance. On Wednesday, August 12, I'll be giving two short talks as part of sessions on flash-enabled power savings and data center applications: 8:30 - 9:45 Power Saving Architectures Enabled by Smarter Software 3:15 - 4:30 The Need For Higher-Level Software for Flash In the evening from 7:30 to 9:00, I'll be hosting a table discussion of software as it pertains to flash.
Flash Memory Summit 2009
At the Flash Memory Summit today, Sun's own Michael Cornwell delivered a keynote excoriating the overall direction of NAND flash and SSDs. In particular, he spoke of the "lithography death march" as NAND vendors push to deliver the most cost-efficient solution while making huge sacrifices in reliability and performance. On Wednesday, August 12, I'll be giving two short talks as part of sessions on flash-enabled power savings and data center applications: 8:30 - 9:45 Power Saving Architectures Enabled by Smarter Software 3:15 - 4:30 The Need For Higher-Level Software for Flash In the evening from 7:30 to 9:00, I'll be hosting a table discussion of software as it pertains to flash.
Triple-Parity RAID-Z
Double-parity RAID, or RAID-6, is the de facto industry standard for storage; when I started talking about triple-parity RAID for ZFS earlier this year, the need wasn't always immediately obvious. Double-parity RAID, of course, provides protection from up to two failures (data corruption or the whole drive) within a RAID stripe. The necessity of triple-parity RAID arises from the observation that while hard drive capacity has followed Kryder's law, doubling annually, hard drive throughput has improved far more modestly. Accordingly, the time to populate a replacement drive in a RAID stripe is increasing rapidly. Today, a 1TB SAS drive takes about 4 hours to fill at its theoretical peak throughput; in a real-world environment that number can easily double, and 2TB and 3TB drives expected this year and next won't move data much faster.
Triple-Parity RAID-Z
Double-parity RAID, or RAID-6, is the de facto industry standard for storage; when I started talking about triple-parity RAID for ZFS earlier this year, the need wasn't always immediately obvious. Double-parity RAID, of course, provides protection from up to two failures (data corruption or the whole drive) within a RAID stripe. The necessity of triple-parity RAID arises from the observation that while hard drive capacity has roughly followed Kryder's law, doubling annually, hard drive throughput has improved far more modestly. Accordingly, the time to populate a replacement drive in a RAID stripe is increasing rapidly. Today, a 1TB SAS drive takes about 4 hours to fill at its theoretical peak throughput; in a real-world environment that number can easily double, and 2TB and 3TB drives expected this year and next won't move data much faster.
Triple-Parity RAID-Z
Double-parity RAID, or RAID-6, is the de facto industry standard for storage; when I started talking about triple-parity RAID for ZFS earlier this year, the need wasn't always immediately obvious. Double-parity RAID, of course, provides protection from up to two failures (data corruption or the whole drive) within a RAID stripe. The necessity of triple-parity RAID arises from the observation that while hard drive capacity has roughly followed Kryder's law, doubling annually, hard drive throughput has improved far more modestly. Accordingly, the time to populate a replacement drive in a RAID stripe is increasing rapidly. Today, a 1TB SAS drive takes about 4 hours to fill at its theoretical peak throughput; in a real-world environment that number can easily double, and 2TB and 3TB drives expected this year and next won't move data much faster.
Sun Storage 7310
Today we're introducing a new member to the Sun Unified Storage family: the Sun Storage 7310. The 7310 is a scalable system from 12TB with a single half-populated J4400 JBOD up to 96TB with 4 JBODs. You can combine two 7310 head units to form a cluster. The base configuration includes a single quad-core CPU, 16GB of DRAM, a SAS HBA, and two available PCIe slots for NICs, backup cards, or the Fishworks cluster card. The 7310 can be thought of as a smaller capacity, lower cost version of the Sun Storage 7410. Like the 7410 it uses high density, low power disks as primary storage and can be enhanced with Readzilla and Logzilla flash accelerators for high performance.
Sun Storage 7310
Today we're introducing a new member to the Sun Unified Storage family: the Sun Storage 7310. The 7310 is a scalable system from 12TB with a single half-populated J4400 JBOD up to 96TB with 4 JBODs. You can combine two 7310 head units to form a cluster. The base configuration includes a single quad-core CPU, 16GB of DRAM, a SAS HBA, and two available PCIe slots for NICs, backup cards, or the Fishworks cluster card. The 7310 can be thought of as a smaller capacity, lower cost version of the Sun Storage 7410. Like the 7410 it uses high density, low power disks as primary storage and can be enhanced with Readzilla and Logzilla flash accelerators for high performance.
Mirroring flash SSDs
As flash memory has become more and more prevalent in storage from the consumer to theenterprise people have been charmed by the performance characteristics, but get stuck on the longevity. SSDs based on SLC flash are typically rated at 100,000 to 1,000,000 write/erase cycles while MLC-based SSDs are rated for significantly less. For conventional hard drives, the distinct yet similar increase in failures over time has long been solved by mirroring (or other redundancy techniques). When applying this same solution to SSDs, a common concern is that two identical SSDs with identical firmware storing identical data would run out of write/erase cycles for a given cell at the same moment and thus data reliability would not be increased via mirroring.
Mirroring flash SSDs
As flash memory has become more and more prevalent in storage from the consumer to theenterprise people have been charmed by the performance characteristics, but get stuck on the longevity. SSDs based on SLC flash are typically rated at 100,000 to 1,000,000 write/erase cycles while MLC-based SSDs are rated for significantly less. For conventional hard drives, the distinct yet similar increase in failures over time has long been solved by mirroring (or other redundancy techniques). When applying this same solution to SSDs, a common concern is that two identical SSDs with identical firmware storing identical data would run out of write/erase cycles for a given cell at the same moment and thus data reliability would not be increased via mirroring.
Mirroring flash SSDs
As flash memory has become more and more prevalent in storage from the consumer to theenterprise people have been charmed by the performance characteristics, but get stuck on the longevity. SSDs based on SLC flash are typically rated at 100,000 to 1,000,000 write/erase cycles while MLC-based SSDs are rated for significantly less. For conventional hard drives, the distinct yet similar increase in failures over time has long been solved by mirroring (or other redundancy techniques). When applying this same solution to SSDs, a common concern is that two identical SSDs with identical firmware storing identical data would run out of write/erase cycles for a given cell at the same moment and thus data reliability would not be increased via mirroring.
Sun Storage 7310
Today we're introducing a new member to the Sun Unified Storage family: the Sun Storage 7310. The 7310 is a scalable system from 12TB with a single half-populated J4400 JBOD up to 96TB with 4 JBODs. You can combine two 7310 head units to form a cluster. The base configuration includes a single quad-core CPU, 16GB of DRAM, a SAS HBA, and two available PCIe slots for NICs, backup cards, or the Fishworks cluster card. The 7310 can be thought of as a smaller capacity, lower cost version of the Sun Storage 7410, and like the 7410 it can be enhanced with Readzilla and Logzilla flash accelerators.
SS 7000 simulator update plus VirtualBox
On the heels of the 2009.Q2.0.0 release, we've posted an update to the Sun Storage 7000 simulator. The simulator contains the exact same software as the other members of the 7000 series, but runs inside a VM rather than on actual hardware. It supports all the same features, and has all the same UI components; just remember that an actual 7000 series appliance is going to perform significantly better than a VM running a puny laptop CPU. Download the simulator here. The new version of the simulator contains two enhancements. First, it comes with the 2009.Q2.0.0 release pre-installed. The Q2 release is the first to provide full support for the simulator, and as I wrote here you can simply upgrade your old simulator.
SS 7000 simulator update plus VirtualBox
On the heels of the 2009.Q2.0.0 release, we've posted an update to the Sun Storage 7000 simulator. The simulator contains the exact same software as the other members of the 7000 series, but runs inside a VM rather than on actual hardware. It supports all the same features, and has all the same UI components; just remember that an actual 7000 series appliance is going to perform significantly better than a VM running a puny laptop CPU. Download the simulator here. The new version of the simulator contains two enhancements. First, it comes with the 2009.Q2.0.0 release pre-installed. The Q2 release is the first to provide full support for the simulator, and as I wrote here you can simply upgrade your old simulator.
SS 7000 simulator update plus VirtualBox
On the heels of the 2009.Q2.0.0 release, we've posted an update to the Sun Storage 7000 simulator. The simulator contains the exact same software as the other members of the 7000 series, but runs inside a VM rather than on actual hardware. It supports all the same features, and has all the same UI components; just remember that an actual 7000 series appliance is going to perform significantly better than a VM running a puny laptop CPU. Download the simulator here. The new version of the simulator contains two enhancements. First, it comes with the 2009.Q2.0.0 release pre-installed. The Q2 release is the first to provide full support for the simulator, and as I wrote here you can simply upgrade your old simulator.
Sun Storage 7000 simulator upgrade
Today we released the first major software update for the Sun Storage 7000 series. It includes a bunch of new features, bug fixes, and improvements. Significantly for users of the Sun Storage 7000 simulator, the virtual machine version of the 7000 series, this is the first update that supports the VMs. As with a physical 7000 series appliance, upgrade by navigating to Maintenance > System, and click the + icon next to Available Updates. Remember not to ungzip the update binary — the appliance will do that itself.
Sun Storage 7000 simulator upgrade
Today we released version 2009.Q2.0.0, the first major software update for the Sun Storage 7000 series. It includes a bunch of new features, bug fixes, and improvements. Significantly for users of the Sun Storage 7000 simulator, the virtual machine version of the 7000 series, this is the first update that supports the VMs. As with a physical 7000 series appliance, upgrade by navigating to Maintenance > System, and click the + icon next to Available Updates. Remember not to ungzip the update binary — the appliance will do that itself. We'll be releasing an update VM preinstalled with the new bits so stay tuned.
Sun Storage 7000 simulator upgrade
Today we released version 2009.Q2.0.0, the first major software update for the Sun Storage 7000 series. It includes a bunch of new features, bug fixes, and improvements. Significantly for users of the Sun Storage 7000 simulator, the virtual machine version of the 7000 series, this is the first update that supports the VMs. As with a physical 7000 series appliance, upgrade by navigating to Maintenance > System, and click the + icon next to Available Updates. Remember not to ungzip the update binary — the appliance will do that itself. We'll be releasing an update VM preinstalled with the new bits so stay tuned.
SSDs for HSPs
We're announcing a couple of new things in the flash SSD space. First, support the Intel X25 SSD in a bunch of our servers. This can be used to create a Hybrid Storage Pool like in the Sun Storage 7000 series, or as just a little flash for high performance / low power / tough environmentals. Second, we're introducing a new open standard with the Open Flash Module. This creates a new form factor for SSDs bringing flash even closer to the CPU for higher performance and tighter system integration. SSDs in HDD form factors were a reasonable idea to gain market acceptance in much the same way as you first listened to your iPod over your car stereo with that weird tape adapter.
SSDs for HSPs
We're announcing a couple of new things in the flash SSD space. First, support the Intel X25-E SSD in a bunch of our servers. This can be used to create a Hybrid Storage Pool like in the Sun Storage 7000 series, or as just a little flash for high performance / low power / tough environmentals. Second, we're introducing a new open standard with the Open Flash Module. This creates a new form factor for SSDs bringing flash even closer to the CPU for higher performance and tighter system integration. SSDs in HDD form factors were a reasonable idea to gain market acceptance in much the same way as you first listened to your iPod over your car stereo with that weird tape adapter.
SSDs for HSPs
We're announcing a couple of new things in the flash SSD space. First, support the Intel X25-E SSD in a bunch of our servers. This can be used to create a Hybrid Storage Pool like in the Sun Storage 7000 series, or as just a little flash for high performance / low power / tough environmentals. Second, we're introducing a new open standard with the Open Flash Module. This creates a new form factor for SSDs bringing flash even closer to the CPU for higher performance and tighter system integration. SSDs in HDD form factors were a reasonable idea to gain market acceptance in much the same way as you first listened to your iPod over your car stereo with that weird tape adapter.
Presentation: Hybrid Storage Pools and SSDs
Today at The First Workshop on Integrating Solid-state Memory into the Storage Hierarchy (WISH 2009) I gave a short talk about our experience integrating flash into the storage hierarchy and the interaction with SSDs. In the talk I discussed the recent history of flash SSDs as well as some key areas for future improvements. You can download it here. The workshop was terrific with some great conversations about the state of solid state storage and its future directions; thank you to the organizers and participants.
Presentation: Hybrid Storage Pools and SSDs
Today at The First Workshop on Integrating Solid-state Memory into the Storage Hierarchy (WISH 2009) I gave a short talk about our experience integrating flash into the storage hierarchy and the interaction with SSDs. In the talk I discussed the recent history of flash SSDs as well as some key areas for future improvements. You can download it here. The workshop was terrific with some great conversations about the state of solid state storage and its future directions; thank you to the organizers and participants.
Presentation: Hybrid Storage Pools and SSDs
Today at The First Workshop on Integrating Solid-state Memory into the Storage Hierarchy (WISH 2009) I gave a short talk about our experience integrating flash into the storage hierarchy and the interaction with SSDs. In the talk I discussed the recent history of flash SSDs as well as some key areas for future improvements. You can download it here. The workshop was terrific with some great conversations about the state of solid state storage and its future directions; thank you to the organizers and participants.
Fishworks VM: the 7000 series on your laptop
In May of 2007 I was lined up to give my first customer presentation of what would become the Sun Storage 7000 series. I inherited a well-worn slide deck describing the product, but we had seen the reactions of prospective customers who saw the software live and had a chance to interact with features such as Analytics; no slides would elicit that kind of response. So with some tinkering, I hacked up our installer and shoe-horned the prototype software into a virtual machine. The live demonstration was a hit despite some rocky software interactions. As the months passed, our software became increasingly aware of our hardware platforms; the patches I had used for the virtual machine version fell into disrepair.
Fishworks VM: the 7000 series on your laptop
In May of 2007 I was lined up to give my first customer presentation of what would become the Sun Storage 7000 series. I inherited a well-worn slide deck describing the product, but we had seen the reactions of prospective customers who saw the software live and had a chance to interact with features such as Analytics; no slides would elicit that kind of response. So with some tinkering, I hacked up our installer and shoe-horned the prototype software into a virtual machine. The live demonstration was a hit despite some rocky software interactions. As the months passed, our software became increasingly aware of our hardware platforms; the patches I had used for the virtual machine version fell into disrepair.
Fishworks VM: the 7000 series on your laptop
In May of 2007 I was lined up to give my first customer presentation of what would become the Sun Storage 7000 series. I inherited a well-worn slide deck describing the product, but we had seen the reactions of prospective customers who saw the software live and had a chance to interact with features such as Analytics; no slides would elicit that kind of response. So with some tinkering, I hacked up our installer and shoe-horned the prototype software into a virtual machine. The live demonstration was a hit despite some rocky software interactions. As the months passed, our software became increasingly aware of our hardware platforms; the patches I had used for the virtual machine version fell into disrepair.
More from the storage anarchist
In my last blog post I responded to Barry Burke author of the Storage Anarchist blog. I was under the perhaps naive impression that Barry was an independent voice in the blogosphere. In fact, he's merely Storage Anarchist by night; by day he's the mild-mannered chief strategy officer for EMC's Symmetrix Products Group — a fact notable for its absence from Barry's blog. In my post, I observed that Barry had apparently picked his horse in the flash race and Chris Caldwell commented that "it would appear that not only has he chosen his horse, but that he's planted squarely on its back wearing an EMC jersey."
More from the storage anarchist
In my last blog post I responded to Barry Burke author of the Storage Anarchist blog. I was under the perhaps naive impression that Barry was an independent voice in the blogosphere. In fact, he's merely Storage Anarchist by night; by day he's the mild-mannered chief strategy officer for EMC's Symmetrix Products Group — a fact notable for its absence from Barry's blog. In my post, I observed that Barry had apparently picked his horse in the flash race and Chris Caldwell commented that "it would appear that not only has he chosen his horse, but that he's planted squarely on its back wearing an EMC jersey."
More from the storage anarchist
In my last blog post I responded to Barry Burke author of the Storage Anarchist blog. I was under the perhaps naive impression that Barry was an independent voice in the blogosphere. In fact, he's merely Storage Anarchist by night; by day he's the mild-mannered chief strategy officer for EMC's Symmetrix Products Group — a fact notable for its absence from Barry's blog. In my post, I observed that Barry had apparently picked his horse in the flash race and Chris Caldwell commented that "it would appear that not only has he chosen his horse, but that he's planted squarely on its back wearing an EMC jersey."
Dancing with the Anarchist
Barry Burke, the Storage Anarchist, has written an interesting roundup ("don't miss the amazing vendor flash dance") covering the flash strategies of some players in the server and storage spaces. Sun's position on flash comes out a bit mangled, but Barry can certainly be forgiven for missing the mark since Sun hasn't always communicated its position well. Allow me to clarify our version of the flash dance. Barry's conclusion that Sun sees flash as well-suited for the server isn't wrong — of course it's harder to drive high IOPS and low latency outside a single box. However we've also proven not only that we see a big role for flash in storage, but that we're innovating in that realm with the Hybrid Storage Pool (HSP) an architecture that seamlessly integrates flash into the storage hierarchy.
Dancing with the Anarchist
Barry Burke, the Storage Anarchist, has written an interesting roundup ("don't miss the amazing vendor flash dance") covering the flash strategies of some players in the server and storage spaces. Sun's position on flash comes out a bit mangled, but Barry can certainly be forgiven for missing the mark since Sun hasn't always communicated its position well. Allow me to clarify our version of the flash dance. Barry's conclusion that Sun sees flash as well-suited for the server isn't wrong — of course it's harder to drive high IOPS and low latency outside a single box. However we've also proven not only that we see a big role for flash in storage, but that we're innovating in that realm with the Hybrid Storage Pool (HSP) an architecture that seamlessly integrates flash into the storage hierarchy.
Dancing with the Anarchist
Barry Burke, the Storage Anarchist, has written an interesting roundup ("don't miss the amazing vendor flash dance") covering the flash strategies of some players in the server and storage spaces. Sun's position on flash comes out a bit mangled, but Barry can certainly be forgiven for missing the mark since Sun hasn't always communicated its position well. Allow me to clarify our version of the flash dance. Barry's conclusion that Sun sees flash as well-suited for the server isn't wrong — of course it's harder to drive high IOPS and low latency outside a single box. However we've also proven not only that we see a big role for flash in storage, but that we're innovating in that realm with the Hybrid Storage Pool (HSP) an architecture that seamlessly integrates flash into the storage hierarchy.
HSP talk at the OpenSolaris Storage Summit
The organizers of the OpenSolaris Storage Summit asked me to give a presentation about Hybrid Storage Pools and ZFS. You can download the presentation titled ZFS, Cache, and Flash. In it, I talk about flash as a new caching tier in the storage hierarchy, some of the innovations in ZFS to enable the HSP, and an aside into the how we implement an HSP in the Sun Storage 7410.
HSP talk at the OpenSolaris Storage Summit
The organizers of the OpenSolaris Storage Summit asked me to give a presentation about Hybrid Storage Pools and ZFS. You can download the presentation titled ZFS, Cache, and Flash. In it, I talk about flash as a new caching tier in the storage hierarchy, some of the innovations in ZFS to enable the HSP, and an aside into the how we implement an HSP in the Sun Storage 7410.
HSP talk at the OpenSolaris Storage Summit
The organizers of the OpenSolaris Storage Summit asked me to give a presentation about Hybrid Storage Pools and ZFS. You can download the presentation titled ZFS, Cache, and Flash. In it, I talk about flash as a new caching tier in the storage hierarchy, some of the innovations in ZFS to enable the HSP, and an aside into the how we implement an HSP in the Sun Storage 7410.
Flash workshop at ASPLOS
Before this year's ASPLOS conference, I'll be speaking at the First Workshop on Integrating Solid-state Memory into the Storage Hierarchy (WISH2009). It looks like a great program with some terrific papers on how to use flash effectively and how to combine various solid state technologies to complement conventional storage. I'll be talking about the work we've done at Sun on the Hybrid Storage Pool. In addition I'll discuss some of the new opportunities that flash and other solid state technologies create. The workshop takes place in Washington D.C. on March 7th. Hope to see you there. In semi-related news, along with Eric and Mike I'll be speaking at the OpenSolaris Storage Summit in San Francisco this coming Monday the 23rd.
Flash workshop at ASPLOS
Before this year's ASPLOS conference, I'll be speaking at the First Workshop on Integrating Solid-state Memory into the Storage Hierarchy (WISH2009). It looks like a great program with some terrific papers on how to use flash effectively and how to combine various solid state technologies to complement conventional storage. I'll be talking about the work we've done at Sun on the Hybrid Storage Pool. In addition I'll discuss some of the new opportunities that flash and other solid state technologies create. The workshop takes place in Washington D.C. on March 7th. Hope to see you there.
Flash workshop at ASPLOS
Before this year's ASPLOS conference, I'll be speaking at the First Workshop on Integrating Solid-state Memory into the Storage Hierarchy (WISH2009). It looks like a great program with some terrific papers on how to use flash effectively and how to combine various solid state technologies to complement conventional storage. I'll be talking about the work we've done at Sun on the Hybrid Storage Pool. In addition I'll discuss some of the new opportunities that flash and other solid state technologies create. The workshop takes place in Washington D.C. on March 7th. Hope to see you there. In semi-related news, along with Eric and Mike I'll be speaking at the OpenSolaris Storage Summit in San Francisco this coming Monday the 23rd.
Casting the shadow of the Hybrid Storage Pool
The debate, calmly waged, on the best use of flash in the enterprise can be summarized as whether flash should be a replacement for disk, acting as primary storage, or it should be regarded as a new, and complementary tier in the storage hierarchy, acting as a massive read cache. The market leaders in storage have weighed in the issue, and have declared incontrovertibly that, yes, both are the right answer, but there's some bias underlying that equanimity. Chuck Hollis, EMC's Global Marketing CTO, writes, that "flash as cache will eventually become less interesting as part of the overall discussion... Flash as storage?
Casting the shadow of the Hybrid Storage Pool
The debate, calmly waged, on the best use of flash in the enterprise can be summarized as whether flash should be a replacement for disk, acting as primary storage, or it should be regarded as a new, and complementary tier in the storage hierarchy, acting as a massive read cache. The market leaders in storage have weighed in the issue, and have declared incontrovertibly that, yes, both are the right answer, but there's some bias underlying that equanimity. Chuck Hollis, EMC's Global Marketing CTO, writes, that "flash as cache will eventually become less interesting as part of the overall discussion... Flash as storage?
Casting the shadow of the Hybrid Storage Pool
The debate, calmly waged, on the best use of flash in the enterprise can be summarized as whether flash should be a replacement for disk, acting as primary storage, or it should be regarded as a new, and complementary tier in the storage hierarchy, acting as a massive read cache. The market leaders in storage have weighed in the issue, and have declared incontrovertibly that, yes, both are the right answer, but there's some bias underlying that equanimity. Chuck Hollis, EMC's Global Marketing CTO, writes, that "flash as cache will eventually become less interesting as part of the overall discussion... Flash as storage?
Sun Storage 7410 space calculator
The Sun Storage 7410 is our expandable storage appliance that can be hooked up to anywhere from one and twelve JBODs with 24 1TB disks. With all those disks we provide the several different options for how to arrange them into your storage pool: double-parity RAID-Z, wide-strip double-parity RAID-Z, mirror, striped, and single-parity RAID-Z with narrow stripes. Each of these options has a different mix of availability, performance, and capacity that are described both in the UI and in the installation documentation. With the wide array of supported configurations, it can be hard to know how much usable space each will support.
Sun Storage 7410 space calculator
The Sun Storage 7410 is our expandable storage appliance that can be hooked up to anywhere from one and twelve JBODs with 24 1TB disks. With all those disks we provide the several different options for how to arrange them into your storage pool: double-parity RAID-Z, wide-strip double-parity RAID-Z, mirror, striped, and single-parity RAID-Z with narrow stripes. Each of these options has a different mix of availability, performance, and capacity that are described both in the UI and in the installation documentation. With the wide array of supported configurations, it can be hard to know how much usable space each will support.
Sun Storage 7410 space calculator
The Sun Storage 7410 is our expandable storage appliance that can be hooked up to anywhere from one and twelve JBODs with 24 1TB disks. With all those disks we provide the several different options for how to arrange them into your storage pool: double-parity RAID-Z, wide-strip double-parity RAID-Z, mirror, striped, and single-parity RAID-Z with narrow stripes. Each of these options has a different mix of availability, performance, and capacity that are described both in the UI and in the installation documentation. With the wide array of supported configurations, it can be hard to know how much usable space each will support.
Hybrid Storage Pools in the 7410
The Sun Storage 7000 Series launches today, and with it Sun has the world's first complete product that seamlessly adds flash into the storage hierarchy in what we call the Hybrid Storage Pool. The HSP represents a departure from convention, and a new way of thinking designing a storage system. I've written before about the principles of the HSP, but now that it has been formally announced I can focus on the specifics of the Sun Storage 7000 Series and how it implements the HSP. Sun Storage 7410: The Cadillac of HSPs The best example of the HSP in the 7000 Series is the 7410.
Hybrid Storage Pools in the 7410
The Sun Storage 7000 Series launches today, and with it Sun has the world's first complete product that seamlessly adds flash into the storage hierarchy in what we call the Hybrid Storage Pool. The HSP represents a departure from convention, and a new way of thinking designing a storage system. I've written before about the principles of the HSP, but now that it has been formally announced I can focus on the specifics of the Sun Storage 7000 Series and how it implements the HSP. Sun Storage 7410: The Cadillac of HSPs The best example of the HSP in the 7000 Series is the 7410.
Hybrid Storage Pools in the 7410
The Sun Storage 7000 Series launches today, and with it Sun has the world's first complete product that seamlessly adds flash into the storage hierarchy in what we call the Hybrid Storage Pool. The HSP represents a departure from convention, and a new way of thinking designing a storage system. I've written before about the principles of the HSP, but now that it has been formally announced I can focus on the specifics of the Sun Storage 7000 Series and how it implements the HSP. Sun Storage 7410: The Cadillac of HSPs The best example of the HSP in the 7000 Series is the 7410.
Hybrid Storage Pool goes glossy
I've written about Hybrid Storage Pools (HSPs) here several times as well as in an article that appeared in the ACM's Queue and CACM publications. Now the folks in Sun marketing on the occasion of our joint SSD announcement with Intel have distilled that down to a four page glossy, and they've done a terrific job. I suggest taking a look. The concept behind the HSP is a simple one: combine disk, flash, and DRAM into a single coherent and seamless data store that makes optimal use of each component and its economic niche. The mechanics of how this happens required innovation from the Fishworks and ZFS groups to integrate flash as a new tier in storage hierarchy for use in our forthcoming line of storage products.
Hybrid Storage Pool goes glossy
I've written about Hybrid Storage Pools (HSPs) here several times as well as in an article that appeared in the ACM's Queue and CACM publications. Now the folks in Sun marketing on the occasion of our joint SSD announcement with Intel have distilled that down to a four page glossy, and they've done a terrific job. I suggest taking a look. The concept behind the HSP is a simple one: combine disk, flash, and DRAM into a single coherent and seamless data store that makes optimal use of each component and its economic niche. The mechanics of how this happens required innovation from the Fishworks and ZFS groups to integrate flash as a new tier in storage hierarchy for use in our forthcoming line of storage products.
Hybrid Storage Pool goes glossy
I've written about Hybrid Storage Pools (HSPs) here several times as well as in an article that appeared in the ACM's Queue and CACM publications. Now the folks in Sun marketing on the occasion of our joint SSD announcement with Intel have distilled that down to a four page glossy, and they've done a terrific job. I suggest taking a look. The concept behind the HSP is a simple one: combine disk, flash, and DRAM into a single coherent and seamless data store that makes optimal use of each component and its economic niche. The mechanics of how this happens required innovation from the Fishworks and ZFS groups to integrate flash as a new tier in storage hierarchy for use in our forthcoming line of storage products.
Apple updates DTrace... again
Back in January, I ranted about Apple's ham-handed breakage in their DTrace port. After some injured feelings and teary embraces, Apple cleaned things up a bit, but some nagging issues remained as I wrote: For the Apple folks: I'd argue that revealing the name of otherwise untraceable processes is no more transparent than what Activity Monitor provides — could I have that please? It would be very un-Apple to — you know — communicate future development plans, but in 10.5.5, DTrace has seen another improvement. Previously when using DTrace to observe the system at large, iTunes and other paranoid apps would be hidden; now they're showing up on the radar: # dtrace -n 'profile-1999{ @[execname] = count(); }' dtrace: description 'profile-1999' matched 1 probe \^C loginwindow ...
Apple updates DTrace... again
Back in January, I ranted about Apple's ham-handed breakage in their DTrace port. After some injured feelings and teary embraces, Apple cleaned things up a bit, but some nagging issues remained as I wrote: For the Apple folks: I'd argue that revealing the name of otherwise untraceable processes is no more transparent than what Activity Monitor provides — could I have that please? It would be very un-Apple to — you know — communicate future development plans, but in 10.5.5, DTrace has seen another improvement. Previously when using DTrace to observe the system at large, iTunes and other paranoid apps would be hidden; now they're showing up on the radar: # dtrace -n 'profile-1999{ @[execname] = count(); }' dtrace: description 'profile-1999' matched 1 probe ^C loginwindow ...
Apple updates DTrace... again
Back in January, I ranted about Apple's ham-handed breakage in their DTrace port. After some injured feelings and teary embraces, Apple cleaned things up a bit, but some nagging issues remained as I wrote: For the Apple folks: I'd argue that revealing the name of otherwise untraceable processes is no more transparent than what Activity Monitor provides — could I have that please? It would be very un-Apple to — you know — communicate future development plans, but in 10.5.5, DTrace has seen another improvement. Previously when using DTrace to observe the system at large, iTunes and other paranoid apps would be hidden; now they're showing up on the radar: # dtrace -n 'profile-1999{ @[execname] = count(); }' dtrace: description 'profile-1999' matched 1 probe \^C loginwindow ...
A glimpse into Netapp's flash future
The latest edition of Communications of the ACM includes a panel discussion between "seven world-class storage experts". The primary topic was flash memory and how it impacts the world of storage. The most interesting comment came from Steve Kleiman, Senior Vice President and Chief Scientist at Netapp: My theory is that whether it's flash, phase-change memory, or something else, there is a new place in the memory hierarchy. There was a big blank space for decades that is now filled and a lot of things that need to be rethought. There are many implications to this, and we're just beginning to see the tip of the iceberg.
A glimpse into Netapp's flash future
The latest edition of Communications of the ACM includes a panel discussion between "seven world-class storage experts". The primary topic was flash memory and how it impacts the world of storage. The most interesting comment came from Steve Kleiman, Senior Vice President and Chief Scientist at Netapp: My theory is that whether it's flash, phase-change memory, or something else, there is a new place in the memory hierarchy. There was a big blank space for decades that is now filled and a lot of things that need to be rethought. There are many implications to this, and we're just beginning to see the tip of the iceberg.
A glimpse into Netapp's flash future
The latest edition of Communications of the ACM includes a panel discussion between "seven world-class storage experts". The primary topic was flash memory and how it impacts the world of storage. The most interesting comment came from Steve Kleiman, Senior Vice President and Chief Scientist at Netapp: My theory is that whether its flash, phase-change memory, or something else, there is a new place in the memory hierarchy. There was a big blank space for decades that is now filled and a lot of things that need to be rethought. There are many implications to this, and were just beginning to see the tip of the iceberg.
Hybrid Storage Pools: The L2ARC
I've written recently about the hybrid storage pool (HSP), using ZFS to augment the conventional storage stack with flash memory. The resulting system improve performance, cost, density, capacity, power dissipation — pretty much evey axis of importance. An important component of the HSP is something called the second level adaptive replacement cache (L2ARC). This allows ZFS to use flash as a caching tier that falls between RAM and disk in the storage hierarchy, and permits huge working sets to be serviced with latencies under 100us. My colleague, Brendan Gregg, implemented the L2ARC, and has written a great summary of how the L2ARC works and some concrete results.
Hybrid Storage Pools: The L2ARC
I've written recently about the hybrid storage pool (HSP), using ZFS to augment the conventional storage stack with flash memory. The resulting system improve performance, cost, density, capacity, power dissipation — pretty much evey axis of importance. An important component of the HSP is something called the second level adaptive replacement cache (L2ARC). This allows ZFS to use flash as a caching tier that falls between RAM and disk in the storage hierarchy, and permits huge working sets to be serviced with latencies under 100us. My colleague, Brendan Gregg, implemented the L2ARC, and has written a great summary of how the L2ARC works and some concrete results.
Hybrid Storage Pools: The L2ARC
I've written recently about the hybrid storage pool (HSP), using ZFS to augment the conventional storage stack with flash memory. The resulting system improve performance, cost, density, capacity, power dissipation — pretty much evey axis of importance. An important component of the HSP is something called the second level adaptive replacement cache (L2ARC). This allows ZFS to use flash as a caching tier that falls between RAM and disk in the storage hierarchy, and permits huge working sets to be serviced with latencies under 100us. My colleague, Brendan Gregg, implemented the L2ARC, and has written a great summary of how the L2ARC works and some concrete results.
Hybrid Storage Pools in CACM
As I mentioned in my previous post, I wrote an article about the hybrid storage pool (HSP); that article appears in the recently released July issue of Communications of the ACM. You can find it here. In the article, I talk about a novel way of augmenting the traditional storage stack with flash memory as a new level in the hierarchy between DRAM and disk, as well as the ways in which we've adapted ZFS and optimized it for use with flash. So what's the impact of the HSP? Very simply, the article demonstrates that, considering the axes of cost, throughput, capacity, IOPS and power-efficiency, HSPs can match and exceed what's possible with either drives or flash alone.
Hybrid Storage Pools in CACM
As I mentioned in my previous post, I wrote an article about the hybrid storage pool (HSP); that article appears in the recently released July issue of Communications of the ACM. You can find it here. In the article, I talk about a novel way of augmenting the traditional storage stack with flash memory as a new level in the hierarchy between DRAM and disk, as well as the ways in which we've adapted ZFS and optimized it for use with flash. So what's the impact of the HSP? Very simply, the article demonstrates that, considering the axes of cost, throughput, capacity, IOPS and power-efficiency, HSPs can match and exceed what's possible with either drives or flash alone.
Hybrid Storage Pools in CACM
As I mentioned in my previous post, I wrote an article about the hybrid storage pool (HSP); that article appears in the recently released July issue of Communications of the ACM. You can find it here. In the article, I talk about a novel way of augmenting the traditional storage stack with flash memory as a new level in the hierarchy between DRAM and disk, as well as the ways in which we've adapted ZFS and optimized it for use with flash. So what's the impact of the HSP? Very simply, the article demonstrates that, considering the axes of cost, throughput, capacity, IOPS and power-efficiency, HSPs can match and exceed what's possible with either drives or flash alone.
Flash, Hybrid Pools, and Future Storage
Jonathan had a terrific post yesterday that does an excellent job of presenting Sun's strategy for flash for the next few years. With my colleagues at Fishworks, an advanced product development team, I've spent more than a year working with flash and figuring out ways to integrate flash into ZFS, the storage hierarchy, and our future storage products — a fact to which John Fowler, EVP of storage, alluded recently. Flash opens surprising new vistas; it's exciting to see Sun leading in this field, and it's frankly exciting to be part of it. Jonathan's post sketches out some of the basic ideas on how we're going to be integrating flash into ZFS to create what we call hybrid storage pools that combine flash with conventional (cheap) disks to create an aggregate that's cost-effective, power-efficient, and high-performing by capitalizing on the strengths of the component technologies (not unlike a hybrid car).
Apple updates DTrace
Back in January, I posted about a problem with Apple's port of DTrace to Mac OS X. The heart of the issue is that their port would silently drop data such that certain experiments would be quietly invalid. Unfortunately, most reactions seized on a headline paraphrasing a line of the post — albeit with the critical negation omitted (the subject and language were, perhaps, too baroque to expect the press to read every excruciating word). The good news is that Apple has (quietly) fixed the problem in Mac OS X 10.5.3. One issue was that timer based probes wouldn't fire if certain applications were actively executing (e.g.
dtrace.conf post-post-mortem
This originally was going to be a post-mortem on dtrace.conf, but so much time has passed, that I doubt it qualifies anymore. Back in March, we held the first ever DTrace (un)conference, and I hope I speak for all involved when I declare it a terrific success. And our t-shirts (logo pictured) were, frankly, bomb. Here are some fairly random impressions from the day: Notes on the demographics at dtrace.conf: Macs were the most prevalent laptops by quite a wide margin, and a ton of demos were done under VMware for the Mac. There were a handful of dvorak users who far outnumbered the Esperanto speakers (there were none) despite apparently similarly rationales.
DTrace and JavaOne: The End of the Beginning
It was a good run, but Jarod and I didn't make the cut for JavaOne this year... 2005 In 2005, Jarod came up with what he described as a jacked up way to use DTrace to get inside Java. This became the basis of the Java provider (first dvm for the 1.4.2 and 1.5 JVMs and now the hotspot provider for Java 6). That year, I got to stand up on stage at the keynote with John Loiacono and present DTrace for Java for the first time (to 10,000 people -- I was nervous). John was then the EVP of software at Sun.
Expand-O-Matic RAID-Z
I was having a conversation with an OpenBSD user and developer the other day, and he mentioned some ongoing work in the community to consolidate support for RAID controllers. The problem, he was saying, was that each controller had a different administrative model and utility -- but all I could think was that the real problem was the presence of a RAID controller in the first place! As far as I'm concerned, ZFS and RAID-Z have obviated the need for hardware RAID controllers. ZFS users seem to love RAID-Z, but a frustratingly frequent request is to be able to expand the width of a RAID-Z stripe.
pid2proc for DTrace
The other day, there was an interesting post on the DTrace mailing list asking how to derive a process name from a pid. This really ought to be a built-in feature of D, but it isn't (at least not yet). I hacked up a solution to the user's problem by cribbing the algorithm from mdb's ::pid2proc function whose source code you can find here. The basic idea is that you need to look up the pid in pidhash to get a chain of struct pid that you need to walk until you find the pid in question. This in turn gives you an index into procdir which is an array of pointers to proc structures.
Mac OS X and the missing probes
As has been thoroughly recorded, Apple has included DTrace in Mac OS X. I've been using it as often as I have the opportunity, and it's a joy to be able to use the fruits of our labor on another operating system. But I hit a rather surprising case recently which led me to discover a serious problem with Apple's implementation. A common trick with DTrace is to use a tick probe to report data periodically. For example, the following script reports the ten most frequently accessed files every 10 seconds: io:::start { @[args[2]->fi_pathname] = count(); } tick-10s { trunc(@, 10); printa(@); trunc(@, 0); } This was running fine, but it seemed as though sometimes (particularly with certain apps in the background) it would occasionally skip one of the ten second iterations.
DTrace/Firefox/Leopard
It's been more than a year since I first saw DTrace on Mac OS X, and now it's at last generally available to the public. Not only did Apple port DTrace, but they've also included a bunch of USDT providers. Perl, Python, Ruby -- they all ship in Leopard with built-in DTrace probes that allow developers to observe function calls, object allocation, and other points of interest from the perspective of that dynamic language. Apple did make some odd choices (e.g. no Java provider, spurious modifications to the publicly available providers, a different build process), but on the whole it's very impressive.
What-If Machine: DTrace Port
What if there were a port of DTrace to Linux? What if there were a port of DTrace to Linux: could such a thing be done without violating either the GPL or CDDL? Read on before you jump right to the comments section to add your two cents. In my last post, I discussed an attempt to create a DTrace knockoff in Linux, and suggested that a port might be possible. Naively, I hoped that comments would examine the heart of my argument, bemoan the apparent NIH in the Linux knockoff, regret the misappropriation of slideware, and maybe discuss some technical details -- anything but dwell on licensing issues.
DTrace Knockoffs
Update 8/6/2007: Those of you interested in this entry may also want to check out my next entry on the legality of a hypothetical port of DTrace to Linux. Tools We Wish We Had -- OSCON 7/26/2007 Last week at OSCON someone set up a whiteboard with the heading "Tools We Wish We Had". People added entries (wiki-style); this one in particular caught my eye: dtrace for Linux or something similar (LIKE SYSTEMTAP?) - jdub (NO, LIKE dtrace) - VLAD (like systemtap, but not crap) DTrace So what exactly were they asking for? DTrace is the tool developers and sysadmins have always needed -- whether they knew it or not -- but weren't able to express in words let alone code.
DTrace for Ruby at OSCON 2007
I just got back from OSCON, a conference on Open Source that O'Reilly hosts in Portland annually. The conference offered some interesting content and side-shows with some notable highlights (more on those in the next few days). Brendan and I gave a presentation on how a crew from Sun dropped in on Twitter to help them use DTrace to discover some nasty performance problems. Here's the presentation along with the D scripts and load generators we used for the talk.
DTrace "Scobleized"
Robert Scoble was kind enough to interview us last week for the ScobleShow. Robert pretty much let us riff continuously for half an hour -- we clearly haven't been getting to talk about DTrace enough lately. I thought he would trim it down a bit, but like a scene from Hard Boiled, it's all there. This picture captures my favorite moment (around 16:23) during the interview as the three of us try to formulate a connection between DTrace and green computing... ... and this is as good a time as any to plug the talk Brendan and I will be giving at the end of the month at OSCON.
iSCSI DTrace provider and more to come
People often ask about the future direction of DTrace, and while we have some stuff planned for the core infrastructure, the future is really about extending DTrace's scope into every language, protocol, and application with new providers -- and this development is being done by many different members of the DTrace community. An important goal of this new work is to have consistent providers that work predictably. To that end, Brendan and I have started to sketch out an array of providers so that we can build a consistent model. In that vein, I recently integrated a provider for our iSCSI target into Solaris Nevada (build 69, and it should be in a Solaris 10 update, but don't ask me which one).
DTrace @ JavaOne 2007
This year, Jarod Jenson and I gave an updated version of our DTrace for Java (technology-based applications) talk: The biggest new feature that we demonstrated is the forthcoming Java Statically-Defined Tracing (JSDT) which will allow developers to embed stable probes in their code as we can do today in the kernel with SDT probes and in C and C++ applications with USDT probes. While you can already trace Java applications (and C and C++ applications), static probes let the developer embed stable and semantically rich points of instrumentation that allow the user to examine the application without needing to understand its implementation.
Java/DTrace article
The Texas Ranger himself, Jarod Jenson, has written a nice article about using the new DTrace probes in Java SE 6. If that's up your alley, you should come to the talk Jarod and I will be giving at JavaOne in May. We'll be talking about some of the new features in Java SE 6 and potentially previewing some new features slated for Java SE 7. This will be our third year at JavaOne -- it's great to see how much progress we're making each year. Technorati Tags: DTrace Java
Linux Defection
Ian Murdock has left the Linux Foundation to lead the operating systems strategy here at Sun. The last few years have seen some exciting changes at Sun: releasing Solaris 10 (which includes several truly revolutionary technologies), embracing x86, leading on x64, and taking Solaris open source. That a luminary of the Linux world was enticed by the changes we've made and the technologies we're creating is a huge vote of confidence. From my (admittedly biased) view, OpenSolaris has been breaking away from the pack with technologies like DTrace, ZFS, Zones, SMF, FMA and others.
gzip for ZFS update
The other day I posted about a prototype I had created that adds a gzip compression algorithm to ZFS. ZFS already allows administrators to choose to compress filesystems using the LZJB compression algorithm. This prototype introduced a more effective -- albeit more computationally expensive -- alternative based on zlib. As an arbitrary measure, I used tar(1) to create and expand archives of an ON (Solaris kernel) source tree on ZFS filesystems compressed with lzjb and gzip algorithms as well as on an uncompressed ZFS filesystem for reference: Thanks for the feedback. I was curious if people would find this interesting and they do.
a small ZFS hack
I've been dabbling a bit in ZFS recently, and what's amazing is not just how well it solved the well-understood filesystem problem, but how its design opens the door to novel ways to manage data. Compression is a great example. An almost accidental by-product of the design is that your data can be stored compressed on disk. This is especially interesting in an era when we have CPU cycles to spare, many too few available IOPs, and disk latencies that you can measure with a stop watch (well, not really, but you get the idea). With ZFS can you trade in some of those spare CPU cycles for IOPs by turning on compression, and the additional latency introduced by decompression is dwarfed by the time we spend twiddling our thumbs waiting for the platter to complete another revolution.
It's tested or it's broken
It's amazing how lousy software is. That we as a society have come to accept buggy software as an inevitability is either a testament to our collective tolerance, or -- much more likely -- the near ubiquity of crappy software. So we are guilty of accepting low standards for software, but the smaller we of software writers are guilty of setting those low expectations. And I mean we: all of us. Every programmer has at some time written buggy software (or has never written any software of any real complexity), and while we're absolutely at fault its not from lack of exertion.
DTrace: a history
An unsurprisingly common request on the DTrace discussion forum has been for updated documentation. People have been -- on the whole -- very pleased with the Solaris Dynamic Tracing Guide that we worked hard to produce, but I readily admit that we haven't been nearly as diligent in updating it. OK: we haven't updated it at all. But we have been updating DTrace itself, adding new variables and functions, tacking on new features, adding new providers, and fixing bugs. But unless you've been scraping our putback logs, or reading between the lines on the discussion forum, these features haven't necessarily been obvious.
DTrace is a web developer's best friend
I have this friend who might be most accurately described as a web developer. When DTrace was able to observe php he was interested. Me: "I should give you a demo some time." Him: "Absolutely..." When DTrace ticked Ruby off its list, he was more enthusiastic. Him: "Cool! I loves me the Ruby!" Me: "Let me know when you want that demo". The other day I got an IM from my friend. Him: "DTrace for JavaScript, eh?" Me: "How 'bout that, huh?" Him: "So when can I get that demo?" Last week Brendan Gregg released Helper Monkey -- a DTrace-enabled version of Mozilla's Spider Monkey JavaScript engine.
DTrace user number one
Some people think DTrace was built for developers; others think it was for system administrators; some even think it was a tool designed just for Solaris kernel hackers but was so useful we decided to unleash it on the world. All wrong. The user we always had in mind was Solaris user extraordinaire Jarod Jenson. DTrace let's you explore virtually any element of the system -- it's biggest limitation is the user's own knowledge of the system. Jarod has the most diverse and expansive knowledge of enterprise computing bar none; in his hands DTrace seemingly has no limit. Here's how Jarod works.
Apple's DTrace team
As Bryan wrote, Apple has ported DTrace to Mac OS X. The Apple kernel team invited us to the WWDC today for the (albeit muted) announcement of their DTrace support, and then for a demo and dinner. It was surprisingly fun to play with DTrace on another OS, and it was a true pleasure to talk to the Apple guys who worked on the port. And it's my pleasure to introduce those engineers to the DTrace community at large: James McIlree, Tom Duffy, Steve Peters, Terry Lambert (DTrace on Mac OS X inset) And here's team DTrace with our new friends at Apple: Congratulations to the Apple team and to Mac OS X users.