Source blog: Perspectives
Pat Selinger
The best way to hire great women is to have great women at the top of the company. IBM is a lot stronger for employing Pat Selinger for 29 years . She invented the relational database cost-based optimizer, a technology that sees continued use in relational database management systems today. But more than being a great technologist,...
Seagate HAMR
Yesterday, I visited the Seagate Normandale Minnesota hard disk drive wafer fabrication facility. I?m super excited about HAMR (Heat Assisted Magnetic Recording) and the areal density it supports. Seagate?s Dave Anderson first introduced this to me technology nearly 20 years ago and it?s wonderful to see it delivered to market and the volumes ramping....
CIDR 2024
I helped kick off CIDR2024 yesterday with the keynote, Constraint Driven Innovation. My core thesis is that constraints force innovation. For example, it was slow hard disks that drove the invention of Write Ahead Logging. But constraints also block innovation. In memory databases first described in the 80s remained largely irrelevant for decades waiting for cost effective...
Amazon Elastic Block Store at 15 Years
Just after joining Amazon Web Services in 2009, I met with Andrew Certain, at that time a senior engineer on the Amazon Elastic Block Store (Amazon EBS) team, to get into the details on how EBS was implemented and what plans were looking forward. Andrew took me through the details of this remote block storage...
Great Product Managers
One of the Amazon Operations teams was hosting a conference for Product Managers in their organization and they asked a few of us to record a 1-minute video of what we each view as important attributes of a Product Panager. My take is below with a link to the video. The best Product Managers push...
HPTS 2022
High Performance Transactions System (HPTS) is a invitational conference held once every two years at the Asilomar Conference Center near Monterey California. My first HPTS was back in 1995 thanks to Pat Selinger. I loved it and attended each one up until 2012 when I started a 10 year around-the-world cruise in a small boat....
2022 NAE Frontiers of Engineering
I introduced the 2022 National Academy of Engineering Frontiers of Engineering conference on September 21st. The National Academy of Engineering was founded in 1964 is part of The National Academies of Sciences, Engineering, and Medicine. The NAE operates under the same congressional act of incorporation that established the National Academy of Sciences, signed in 1863 by President...
A Short History of AWS Silicon Innovation
Why would a cloud services company design and deploy custom semiconductors? It definitely wasn?t where I expected we would end up when I joined AWS in 2009 but it?s a decision that has just kept delivering for our customers. It?s been 10 years since those early ideas and, in reflecting on what the team has...
Graviton3 & EC2 C7g General Availability
Today we?re making the AWS Graviton3 processor generally available in the AWS EC2 C7g Instances. Graviton3 and EC2 C7g Instance Type General Availability Video Graviton3 is the third generation of the AWS Graviton CPUs and it continues to raise the bar on performance. Graviton is one of our 4 semiconductor product lines here at AWS....
Xen-on-Nitro: AWS Nitro for Legacy Instances
On August 25, 2006, we started the public beta of our first ever EC2 instance. Back then, it didn?t even have a name yet, but we latter dubbed it ?m1.small.?. Our first customers were able to use the equivalent of 1.7 GHz Xeon processor, 1.75 GB of RAM, 160 GB of local disk and 250...
Happy 15th Birthday EC2!
August 25th, 2021 marks the 15-year anniversary for EC2. Contemplating the anniversary has me thinking back to when I first got involved with cloud-hosted services. It was back in early 2005, about a year before S3 was announced, and I was at a different company working on a technical due diligence project for a corporate...
Tesla Project Dojo Overview
A couple of days back, Genesh Venugopal the leader of the Tesla Dojo project announced the Dojo machine learning training system. It?s an unusually network rich, power dense, and memory light design. I?ve summarized the architecture of the system below but I found three aspects of the system particularly interesting: Massive Network: Each D1 chip...
Reinventing Operational Resiliency
The cloud helps organizations achieve unmatched resiliency at scale. This is a quick write-up I did on the AWS approach to resiliency: Reinventing Operational Resiliency. A talk I did at re:Invent focused on AWS infrastructure: Tuesday Night Live with James Hamilton. Graviton AWS Arm server announcement: M6g, C6g, and R6g EC2 instances powered by Graviton2.
Human Race Priorities?
I was recently in a super interesting discussion mostly focused on energy efficiency and, as part of the discussion, the claim was raised that Nobel Laureate Richard Smalley was right when he said that Energy was the number one challenge facing our planet. I?m a pretty big believer in energy efficiency and the importance of...
How Complex Systems Fail
This is a simple little article that’s worth reading. I don’t agree with every point made but all 18 are interesting and every one of them leads to some introspection and how it compares with the experiences I have come across. It’s nice and concise with unusually good reading time to value ration. How Complex...
Anandtech on AWS Graviton2
Yesterday, Anandtech published what is, by far, the most detailed write-up on the AWS Graviton2 processor. In this article the author, Andrei Frumusanu, compared the Graviton2 with the AMD EPYC 7571 and the Intel Platinum 8259CL. Worth reading. From Anandtech: Amazon?s Arm-based Graviton2 Against AMD and Intel: Comparing Cloud Compute
AWS Graviton2
In November of last year, AWS announced the first ARM-based AWS instance type (AWS Designed Processor: Graviton). For me this was a very big deal because I?ve been talking about ARM based servers for more than a decade, believing that massive client volumes fund the R&D stream that feeds most server-side innovation. In our industry,...
2019 SIGMOD Systems Award
At SIGMOD 2019 in Amsterdam last month it was announced that the Amazon Aurora service has been awarded the 2019 SIGMOD Systems Award. From the awards committee: The SIGMOD Systems Award is awarded to an individual or set of individuals to recognize the development of a software or hardware system whose technical contribqutions have had...
2019 ACM SIGCOMM Test of Time Award
Back in late 2008 and early 2009, I had a few projects underway. One was investigating the impact of high temperatures on the longevity and fault rates in servers. We know what it costs to keep a data center cool, but what I wanted to know is what it would cost if we didn?t keep...
Tesla Full Self Driving ASIC
Tesla hosted Autonomy Day for Analyst on Monday April 22nd beginning at 11am. The video is available at https://www.youtube.com/watch?v=Ucp0TTmvqOE. It?s a bit unusual for a corporate video in that it is 3 hours and 52 minutes long but it?s also unusual in that there is far more real content in there than is typically shown...
AWS Nitro System
At Tuesday Night Live with James Hamilton at the 2016 AWS re:Invent conference, I introduced the first Amazon Web Services custom silicon. The ASIC I showed formed the foundational core of our second generation custom network interface controllers and, even back in 2016, there was at least one of these ASICs going into every new...
Collision at Sea: USS Fitzgerald
At 1:30:34AM on Jun 17, 2017 the USS Fitzgerald and the container ship ACX Crystal came together just south of Yokosuka Japan. The ACX Crystal is a 730? modern containership built in 2008 and capable of carrying 2,858 TEU of containers at a 23-knot service speed. The Fitzgerald is a $1.8B US Navy Arleigh Burk-Class Destroyer...
AWS Inferentia Machine Learning Processor
On Monday night I described AWS Graviton , the general-purpose AWS-developed server processor with 64-bit Arm that powers the EC2 A1 instance family. The five members of the A1 instance family target scale-out workloads such as web servers, caching fleets, and development workloads. This is the first general-purpose processor that has been designed, developed, and...
AWS Designed Processor: Graviton
This is an exciting day and one I?ve been looking forward to for more than a decade. As many of you know, the gestation time for a new innovation at AWS can incredibly short. Some of our most important services went from good ideas to successful, heavily-used services in only months. But, custom silicon is...
Choose Technology Suppliers Carefully
Many years ago, Amazon chose to use Oracle database products to run the business. At the time it was a perfectly rational decision and, back then, many customers made the same choice and some took a different path. I’ve worked on both DB2 and SQL Server over the years so I know well the arguments on...
2017 Turing Award: Dave Patterson & John Hennessy
Earlier this year, Berkeley?s Dave Patterson and Stanford?s John Hennessy won the 2017 Turing Award, the premier award in Computing. From Pioneers of Computer Architecture Receive ACM A.M. Turing Award: NEW YORK, NY, March 21, 2018 ? ACM, the Association for Computing Machinery, today named John L. Hennessy, former President of Stanford University, and David A. Patterson, retired Professor...
Will ML training drive massive growth in networking?
This originally came up in an earlier blog comment but it’s an interesting question and one not necessarily one restricted to the changes driven by deep learning training and other often GPU-hosted workloads. This trend has been underway for a long time and is more obvious when looking at networking which was your example as...
Will ML training drive massive growth in networking?
This originally came up in an earlier blog comment but it’s an interesting question and one not necessarily one restricted to the changes driven by deep learning training and other often GPU-hosted workloads. This trend has been underway for a long time and is more obvious when looking at networking which was your example as...
MySQL 5.6 to MariaDB 10.2.13
Its hard to believe that a relational database in personal use at home will ever have much of a load when it comes to transaction processing but our home RDBMS is surprisingly busy, with more than a hundred database interactions per second. Its still not even within an order of magnitude as busy as many...
MySQL 5.6 to MariaDB 10.2.13
Its hard to believe that a relational database in personal use at home will ever have much of a load when it comes to transaction processing but our home RDBMS is surprisingly busy, with more than a hundred database interactions per second. Its still not even within an order of magnitude as busy as many...
Many years ago I worked on IBM DB2 and so I occasionally get the question, “how the heck could you folks possibly have four relational database management system code bases?” Some go on to argue that a single code base would have been much more efficient. That’s certainly true. And, had we moved to a...
Four DB2 Code Bases?
Many years ago I worked on IBM DB2 and so I occasionally get the question, “how the heck could you folks possibly have four relational database management system code bases?” Some go on to argue that a single code base would have been much more efficient. That’s certainly true. And, had we moved to a...
When You Cant Afford Not to Have Power Redundancy
Atlanta Hartsfield-Jackson International Airport suffered a massive power failure yesterday where the entire facility except for emergency lighting and safety equipment was down for nearly 11 hours. The popular press coverage on this power failure is extensive but here are two examples: WSJ: https://www.wsj.com/articles/power-outage-halts-flights-at-atlanta-international-airport-1513543883 (pay wall) CNN: http://edition.cnn.com/2017/12/17/us/atlanta-airport-power-outage/index.html For most years since 1998, Atlanta International...
When You Cant Afford Not to Have Power Redundancy
Atlanta Hartsfield-Jackson International Airport suffered a massive power failure yesterday where the entire facility except for emergency lighting and safety equipment was down for nearly 11 hours. The popular press coverage on this power failure is extensive but here are two examples: WSJ: https://www.wsj.com/articles/power-outage-halts-flights-at-atlanta-international-airport-1513543883 (pay wall) CNN: http://edition.cnn.com/2017/12/17/us/atlanta-airport-power-outage/index.html For most years since 1998, Atlanta International...
Top 50 Corporate Revenue
This morning, I was thinking about Apple. When I got started in this industry in the early 80s, it was on an Apple II+ writing first in BASIC and later in UCSD Pascal. I though Apple was simply amazing, so it was tough watching the more than decade of decline before Jobs rejoined. Our industry...
Top 50 Corporate Revenue
This morning, I was thinking about Apple. When I got started in this industry in the early 80s, it was on an Apple II+ writing first in BASIC and later in UCSD Pascal. I thought Apple was simply amazing, so it was tough watching the more than decade of decline before Jobs rejoined. Our industry...
Tensor Processing Unit
For years Ive been saying that, as more and more workloads migrate to the cloud, the mass concentration of similar workloads make hardware acceleration a requirement rather than an interesting option. When twenty servers are working on a given task, it makes absolutely no sense to do specialized hardware acceleration. When one thousand servers are...
Tensor Processing Unit
For years Ive been saying that, as more and more workloads migrate to the cloud, the mass concentration of similar workloads make hardware acceleration a requirement rather than an interesting option. When twenty servers are working on a given task, it makes absolutely no sense to do specialized hardware acceleration. When one thousand servers are...
How Many Data Centers Needed World-Wide
Last week Fortune asked Mark Hurd, Oracle co-CEO, how Oracle was going to compete in cloud computing when their capital spending came in at $1.7B whereas the aggregate spending of the three cloud players was $31B. Essentially the question was, if you assume the big three are spending roughly equally, how can $1.7B compete with...
How Many Data Centers Needed World-Wide
Last week Fortune asked Mark Hurd, Oracle co-CEO, how Oracle was going to compete in cloud computing when their capital spending came in at $1.7B whereas the aggregate spending of the three cloud players was $31B. Essentially the question was, if you assume the big three are spending roughly equally, how can $1.7B compete with...
Airline Overbooking isnt Evil
Airline overbooking isnt evil. In fact, if done properly, its good for airlines, good for customers, and good for the environment. Sold seats are clearly good for airlines and their shareholders. High utilization is good for customers because it reduces seat costs for airlines which normally operate at single digit profit margins. In the consumer...
Airline Overbooking isnt Evil
Airline overbooking isnt evil. In fact, if done properly, its good for airlines, good for customers, and good for the environment. Sold seats are clearly good for airlines and their shareholders. High utilization is good for customers because it reduces seat costs for airlines which normally operate at single digit profit margins. In the consumer...
At Scale, Rare Events arent Rare
I’m a connoisseur of failure. I love reading about engineering failures of all forms and, unsurprisingly, I’m particularly interested in data center faults. It’s not that I delight in engineering failures. My interest is driven by believing that the more faults we all understand, the more likely we can engineer systems that don’t suffer from...
At Scale, Rare Events arent Rare
I’m a connoisseur of failure. I love reading about engineering failures of all forms and, unsurprisingly, I’m particularly interested in data center faults. It’s not that I delight in engineering failures. My interest is driven by believing that the more faults we all understand, the more likely we can engineer systems that don’t suffer from...
CS Responder Trans-Oceanic Cable Layer
Laying fiber optic cables with repeaters along the ocean floor raises super-interesting technical challenges. I recently visited the CS Responder, a trans-ocean cable-laying ship operated by TE Connectivity. TE Connectivity is $13.3B global technology company that specializes in communication cable, connectors, sensors, and electronic components. Their subsidiary TE SubCom manufactures, lays and maintains undersea cable. TE SubCom has a base...
CS Responder Trans-Oceanic Cable Layer
Laying fiber optic cables with repeaters along the ocean floor raises super-interesting technical challenges. I recently visited the CS Responder, a trans-ocean cable-laying ship operated by TE Connectivity. TE Connectivity is $13.3B global technology company that specializes in communication cable, connectors, sensors, and electronic components. Their subsidiary TE SubCom manufactures, lays and maintains undersea cable. TE SubCom has a base...
KVH Industries Tour
As part of my home blog, I often describe visit to plants, factories, and ships in the “Technology Series“. Over the years, we’ve covered mining truck manufacturers, sail boat racing, a trip on a ship-assist tug boat, a tour of Holland America’s Westerdam, a visit to an Antarctic ice breaker, a tour of a Panamax container...
AWS re:Invent 2016
Last week we held the 5th annual AWS re:Invent conference. This conference is my favorite opportunity to get into more detail with customers and partners and to learn more about some of the incredible innovations AWS customers are producing. The first year, I was impressed by the conference scale. Back in 2012 it still felt like...
Advice to Someone Just Entering Our Industry
This morning I’m thinking through what I’m going to say at the AWS re:Invent Conference next week and I suppose I’m in a distractable mood. When an email came in asking “what advice would you give someone entering the information technology industry in 2017?” I gave it some thought. This was my short take: Play...
David Patterson Retires After 40 Years
David Patterson has had a phenomenal impact on computer architecture and computer science over the last 40 years. Hes perhaps most notable for the industry impact of the projects hes led over these years. I first got to know his work back when the Berkeley Reduced Instruction Set Computer project started publishing. The RISC project...
A Decade of Innovation
March 14, 2006 was the beginning of a new era in computing. That was the day that Amazon Web Services released the Simple Storage Service (S3). Technically, Simple Queuing Services was released earlier but it was the release of S3 that really lit the fire under cloud computing. I remember that day well. At the...
Everspan Optical Cold Stroage
Optical Archive Inc. was a startup founded by Frank Frankovsky and Gio Cogliatore. I first met Frank many years ago when he was Director of the Dell Data Center Solutions team. DCS was part of the massive Dell Computer company but they still ran like a startup. And, with Jimmy Pike leading many of their...
Volkswagen Emissions Fiasco
Im an avid reader of engineering disasters since one of my primary roles in my day job is to avoid them. And, away from work, we are taking a small boat around the world with only two people on board and that too needs to be done with care where an engineering or operational mistake...
VPC NAT gateways : transactional uniqueness at scale
This is a guest blog post on Perspectives from Colm MacCarthaigh, a senior engineer on the Amazon Web Services team that designed and built the new VPC Network Address Translation Gateway service that just went live yesterday. Over the last 25 years, Network Address Translation (NAT) has become almost ubiquitous in networks of any size. The...
ARM Server Market
Microservers and the motivations for microservers have been around for years. I first blogged about them back in 2008 (Cooperative, Expendable, Microslice, Servers: Low-Cost, Low-Power Servers for Internet-Scale Services) and even Intel has entered the market with Atom but its the ARM instruction set architecture that has had the majority of server world attention. There...
Data Center Power & Water Consumption
Im Interested in data center resource consumption in general and power is a significant component of overall operating cost and also has impact on the environment so, naturally, it gets most of the focus when discussing data center resource consumption. As with all real issues, there is always a bit of hyperbole and some outright...
Flash Storage Failure Rates From A Large Population
I love real data. Real data is so much better than speculation and, what Ive learned from years of staring at production systems, is the real data from the field is often surprisingly different from popular opinion. Disk failure rates are higher than manufacturer specifications, ECC memory faults happen all the time, and events that...
2014 ACM Turing Award
Academic researchers work on problems they believe to be interesting and then publish their results. Particularly good researchers listen carefully to industry problems to find real problems, produce relevant work and then publish the results. True giants of academia listen carefully to find real problems, produce relevant results, build real systems that actually work, and...
Greenpeace, Renewable Energy, and Data Centers
Greenpeace has focused on many issues of great import over the years. I like whales, dont like shark finning, and its hard to be a huge fan of testing nuclear weapons on South Pacific islands. Much good work has been done and continues to be done. Over the past three to five years, Greenpeace has...
The Return to the Cloud
Zynga is often in the news because gaming is hot and Zynga has been, and continues to be, a successful gaming company. What’s different here is the story isn’t about gaming nor is it really about Zynga itself. The San Francisco gaming house with a public valuation of $2.5B was an earlier adopter of cloud...
50 Years of Moores Law: IEEE Spectrum Report
IEEE Spectrum recently published a special report titled 50 Years of Moores Law. Spectrum, unlike many purely academic publications, covers a broad set of topics in a way accessible to someone working outside the domain but not so watered down as to become uninteresting. As I read through this set of articles on Moores law...
Goodbye GoDaddy
perspectives.mvdirona.com Nov 26, 2007 Back in 2005, I maintained a blog accessible only inside of Microsoft where I worked at the time. Having the blog internal to the company allowed confidential topics to be discussed openly, but over time, I found much of what I was writing about might be useful externally. And I knew,...
Why Renewable Energy (Alone) Won't Full Solve the Problem
Back in 2007, the audacious RE<C project was started. The goal of RE<C was simple: make renewable energy less costly than coal and let economics do the hard work of converting the worlds energy producers to go renewable. I blogged the project in Solving World Problems With Economic Incentives summarizing the project with the core idea is that, if renewable energy sources were cheaper the coal, economic forces would quickly make the right thing happen and we would actually stop burning coal. I love the approach but it is fiendishly difficult. Unfortunately, RE<C really was fiendishly difficult and the project was subsequently abandoned in 2011.
Why Renewable Energy (Alone) Wont Fully Solve the Problem
Back in 2007, the audacious RE<C project was started. The goal of RE<C was simple: make renewable energy less costly than coal and let economics do the hard work of converting the worlds energy producers to go renewable. I blogged the project in Solving World Problems With Economic Incentives summarizing the project with the core...
AWS re:Invent Conference
In the Amazon Web Services world, this has always been a busy time of the year. Busy, because, although we aim for a fairly even pace of new service announcements and new feature releases all year, invariably, somewhat more happens towards the end of the year than early on. And, busy, because the annual AWS re:Invent conference is in early November and this is an important time to roll out new services or important features. This year is no exception and, more than ever, there is a lot to announce at the conference. It should be fun. < ?xml:namespace prefix = "o" ns = "urn:schemas-microsoft-com:office:office" /> I enjoy re:Invent because its a chance to talk to customers in more detail about what we have been building, learn how they are using it, and what we could do to make the services better.
AWS re:Invent Conference
In the Amazon Web Services world, this has always been a busy time of the year. Busy, because, although we aim for a fairly even pace of new service announcements and new feature releases all year, invariably, somewhat more happens towards the end of the year than early on. And, busy, because the annual AWS...
Recovering Cryogenic Refrigeration Energy
Waste heat reclamation in datacenters has long been viewed as hard because the heat released is low grade. What this means is that rather than having a great concentration of heat, it is instead spread out and, in fact, only warm. The more concentrated the heat, the easier it is to use. In fact, that is exactly how many power plants work. When the temperature of the cooling medium is several orders of magnitude cooler than burning fuels such LNG, Petroleum, or Coal, extracting useful energy becomes challenging. However, data center heat reclamation si clearly a problem well worth solving since just about 100% of the power that enters each facility is released as heat into the environment.
Recovering Cryogenic Refrigeration Energy
Waste heat reclamation in datacenters has long been viewed as hard because the heat released is low grade. What this means is that rather than having a great concentration of heat, it is instead spread out and, in fact, only warm. The more concentrated the heat, the easier it is to use. In fact, that...
August 21, 2014 Computer History Museum Presentation
Dileep Bhandarkar put together a great presentation for the Computer History Museum a couple of weeks back. I have no idea how he got through the full presentation in under an hour it covers a lot of material but its an interesting walk through history. Over the years, Dileep has worked for Texas Instruments, Intel, Microsoft, and Qualcomm and, as a consequence, hes been near the early days of semiconductors, the rise and fall of the mini-computer, 17 years at Intel, a ½ decade at Microsoft and hes now working at Qualcomm.
August 21, 2014 Computer History Museum Presentation
Dileep Bhandarkar put together a great presentation for the Computer History Museum a couple of weeks back. I have no idea how he got through the full presentation in under an hour it covers a lot of material but its an interesting walk through history. Over the years, Dileep has worked for Texas...
Data Center Cooling Done Differently
Over the last 10 years, there has been considerable innovation in data center cooling. Large operators are now able to operate at Power Usage Efficiency of 1.10 to 1.20. This means that less than 20% of the power delivered to the facility is lost to power distribution and cooling. These days, very nearly all of the power delivered to the facility, is delivered to the servers. I would never say there is no more innovation coming in an but most of the ideas Ive been seeing recently in data center cooling designs are familiar.
Data Center Cooling Done Differently
Over the last 10 years, there has been considerable innovation in data center cooling. Large operators are now able to operate at Power Usage Efficiency of 1.10 to 1.20. This means that less than 20% of the power delivered to the facility is lost to power distribution and cooling. These days, very nearly all of...
Challenges in Designing at Scale: Formal Methods in Building Robust Distributed Systems
We all know that when designing and operating applications at scale, it is persistent state management that brings the most difficult challenges. Delivering state-free applications has always been (fairly) easy. But most interesting commercial and consumer applications need to manage persistent state. Advertising needs to be delivered, customer activity needs to be tracked, and products need to be purchased. Interesting applications that run at scale all have difficult persistent state problems. Thats why Amazon.com, other AWS customers, and even other AWS services make use of the various AWS database platform services. Delegating the challenge of managing high-performance, transactional, and distributed data management to underlying services makes applications more robust while reducing operational overhead and design complexity.
Challenges in Designing at Scale: Formal Methods in Building Robust Distributed Systems
We all know that when designing and operating applications at scale, it is persistent state management that brings the most difficult challenges. Delivering state-free applications has always been (fairly) easy. But most interesting commercial and consumer applications need to manage persistent state. Advertising needs to be delivered, customer activity needs to be tracked, and products...
Network Neutrality and the FCC Proposal to Abandon it
The internet and the availability of content broadly and uniformly to all users has driven the largest wave of innovation ever e experienced in our industry. Small startups offering a service of value have the same access to customers as the largest and best funded incumbents. All customers have access to the same array of content regardless of their interests or content preferences. Some customers have access to faster access than others but, whatever the access speed, all customers have access to it all content uniformly. Some countries have done an amazing job of getting high speed access to an very broad swath of the population.
Network Neutrality and the FCC Proposal to Abandon it
The internet and the availability of content broadly and uniformly to all users has driven the largest wave of innovation ever e experienced in our industry. Small startups offering a service of value have the same access to customers as the largest and best funded incumbents. All customers have access to the same array of...
Air Traffic Control System Failure & Complex System Testing
Its difficult to adequately test complex systems. But whats really difficult is keeping a system adequately tested. Creating systems that do what they are designed to do is hard but, even with the complexity of these systems, many life critical systems have the engineering and production testing investment behind them to be reasonably safe when deployed. Its keeping them adequately tested over time as conditions and the software system changes where we sometimes fail. There are exceptions to the general observation that we can build systems that operate safely when inside reasonable expectations of expected operating conditions.
Air Traffic Control System Failure & Complex System Testing
Its difficult to adequately test complex systems. But whats really difficult is keeping a system adequately tested. Creating systems that do what they are designed to do is hard but, even with the complexity of these systems, many life critical systems have the engineering and production testing investment behind them to be reasonably safe when...
Energy Efficiency of Cloud Computing
Most agree that cloud computing is inherently more efficient that on premise computing in each of several dimensions. Last November, I went after two of the easiest to argue gains: utilization and the ability to sell excess capacity (Datacenter Renewable Power Done Right): Cloud computing is a fundamentally more efficiently way to operate compute infrastructure. The increases in efficiency driven by the cloud are many but a strong primary driver is increased utilization. All companies have to provision their compute infrastructure for peak usage. But, they only monetize the actual usage which goes up and down over time.
Energy Efficiency of Cloud Computing
Most agree that cloud computing is inherently more efficient that on premise computing in each of several dimensions. Last November, I went after two of the easiest to argue gains: utilization and the ability to sell excess capacity (Datacenter Renewable Power Done Right): Cloud computing is a fundamentally more efficiently way to operate compute infrastructure....
Optical Archival Storage Technology
Its an unusual time in our industry where many of the most interesting server, storage, and networking advancements arent advertised, dont have a sales team, dont have price lists, and actually are often never even mentioned in public. The largest cloud providers build their own hardware designs and, since the equipment is not for sale, its typically not discussed publically. A notable exception is Facebook. They are big enough that they do some custom gear but they dont view their hardware investments as differentiating. That may sound a bit strange -- why spend on something if it is not differentiating?
The Cloud: Fastest Industry Transition Ever
Its not often Im enthused about spending time in Las Vegas but this years AWS re:Invent conference was a good reason to be there. Its exciting getting a chance to meet with customers who have committed their business to the cloud or are wrestling with that decision. The pace of growth since last years was startling but what really caught my attention was the number of companies that had made the transition between testing on the cloud to committing their most valuable workloads to run there. I fully expected this to happen but Ive seen these industry sweeping transitions before.
Datacenter Renewable Power Done Right
Facebook Iowa Data Center In 2007, the EPA released a study on datacenter power consumption at the request of the US Congress (EPA Report to Congress on Server and Data Center Efficiency). The report estimated that the power consumption of datacenters represented about 1.5% of the US Energy Budget in 2005 and this number would double by 2010. In a way, this report was believable in that datacenter usage was clearly on the increase. What the report didnt predict was the pace of innovation in datacenter efficiency during that period. Increased use, spurred increased investment, which has led to a near 50% improvement in industry leading datacenter efficiency. Also difficult to predict at the time of the report was the rapid growth of cloud computing.
Solar at Scale: How Big is a Solar Array of 9MW Average Output?
I frequently get asked why not just put solar panels on data center roofs and run them on that. The short answer is datacenter roofs are just way too small. In a previous article (I Love Solar But&) I did a quick back of envelope calculation and, assuming a conventional single floor build with current power densities, each square foot of datacenter space would require roughly 362 sq ft of solar panels. The roof would only contribute roughly 1% of the facility requirements. Quite possibly still worth doing but there is simply no way a roof top array is going to power an entire datacenter.
Counting Servers is Hard
At the Microsoft World-Wide Partners Conference, Microsoft CEO Steve Ballmer announced that We have something over a million servers in our data center infrastructure. Google is bigger than we are. Amazon is a little bit smaller. You get Yahoo! and Facebook, and then everybody else is 100,000 units probably or less. Thats a surprising data point for a variety of reasons. The most surprising is that the data point was released at all. Just about nobody at the top of the server world chooses to boast with the server count data point. Partly because its not all that useful a number but mostly because a single data point is open to a lot of misinterpretation by even skilled industry observers.
Cumulus Networks: A Sneak Preview of One of My Favorite Startups
Back in 2009, in Datacenter Networks are in my way, I argued that the networking world was stuck in the mainframe business model: everything vertically integrated. In most datacenter networking equipment, the core Application Specific Integrated Circuit (ASIC the heart of a switch or router), the entire hardware platform for the ASIC including power and physical network connections, and the software stack including all the protocols all come from a single vender and there is no practical mechanism to make different choices. This is how the server world operated back 40 years ago and we get much the same result. Networking gear is expensive, interoperates poorly, is expensive to manage and is almost always over-subscribed and constraining the rest of the equipment in the datacenter. Further exaggerating what is already a serious problem, unlike the mainframe server world of 40 years back, networking equipment is also unreliable.
The Power Failure Seen Around the World
In the data center world, there are few events taken more seriously than power failure and considerable effort is spent to make them rare. When a datacenter experiences a power failure, its a really big deal for all involved. But, a big deal in the infrastructure world still really isnt a big deal on the world stage. The Super Bowl absolutely is a big deal by any measure. On average over the last couple of years, the Super Bowl has attracted 111 million viewers and is the number 1 most watched television show in North America eclipsing the final episode of Mash. World-wide, the Super Bowl is only behind the European Cup (UEFA Champions Leaque) which draws 178 million viewers.
Customer Trust
In the cloud there is nothing more important than customer trust. Without customer trust, a cloud business cant succeed. When you are taking care of someone elses assets, you have to treat those assets as more important than your own. Security has to be rock solid and absolutely unassailable. Data loss or data corruption has to be close to impossible and incredibly rare. And all commitments to customers have to be respected through business changes. These are hard standards to meet but, without success against these standards, a cloud service will always fail. Customers can leave any time and, if they have to leave, they will remember you did this to them.
Microserver Market Heats up: Intel Atom S1200 (Centerton) Announcement
Since 2008, Ive been excited by, working on, and writing about Microservers. In these early days, some of the workloads I worked with were I/O bound and didnt really need or use high single-thread performance. Replacing the server class processors that supported these applications with high-volume, low-cost client system CPUs yielded both better price/performance and power/performance. Fortunately, at that time, there were good client processors available with ECC enabled (see You Really DO Need ECC) and most embedded system processors also supported ECC. I wrote up some of the advantages of these early microserver deployments and showed performance results from a production deployment in an internet-scale mail processing application in Cooperative, Expendable, Microslice, Servers: Low-Cost, Low-Power Servers for Internet-Scale Services.
Redshift: Data Warehousing at Scale in the Cloud
Ive worked in or near the database engine world for more than 25 years. And, ironically, every company Ive ever worked at has been working on a massive-scale, parallel, clustered RDBMS system. The earliest variant was IBM DB2 Parallel Edition released in the mid-90s. Its now called the Database Partitioning Feature. Massive, multi-node parallelism is the only way to scale a relational database system so these systems can be incredibly important. Very high-scale MapReduce systems are an excellent alternative for many workloads. But some customers and workloads want the flexibility and power of being able to run ad hoc SQL queries against petabyte sized databases.
AMD Announces Server Targeted ARM Part
I have been interested in, and writing about, microservers since 2007. Microservers can be built using any instruction set architecture but Im particularly interested in ARM processors and their application to server-side workloads. Today Advanced Micro Devices announced they are going to build an ARM CPU targeting the server market. This will be 4-core, 64 bit, more than 2Ghz part that is expected to sample in 2013 and ship in volume in early 2014. AMD is far from new to microserver market. In fact, much of my past work on microservers has been AMD-powered.
Google Mechanical Design
When I come across interesting innovations or designs notably different from the norm, I love to dig in and learn the details. More often than not I post them here. Earlier this week, Google posted a number of pictures taken from their datacenters (Google Data Center Tech). The pictures are beautiful and of interest to just about anyone, somewhat more interesting to those working in technology, and worthy of detailed study for those working in datacenter design. My general rule with Google has always been that anything they show publically is always at least one generation old and typically more.
Amazon Event in Palo Alto (10/11@5pm)
The last few weeks have been busy and it has been way too long since I have blogged. Im currently thinking through the server tax and whats wrong with the current server hardware ecosystem but dont have anything yet ready to go on that just yet. But, there are a few other things on the go. I did a talk at Intel a couple of weeks back and last week at the First Round Capital CTO summit. Ive summarized what I covered below with pointers to slides. In addition, Ill be at the Amazon in Palo Alto event this evening and will do a talk there as well.
Glacier: Engineering for Cold Data Storage in the Cloud
Earlier today Amazon Web Services announced Glacier, a low-cost, cloud-hosted, cold storage solution. Cold storage is a class of storage that is discussed infrequently and yet it is by far the largest storage class of them all. Ironically, the storage we usually talk about and the storage Ive worked on for most of my life is the high-IOPS rate storage supporting mission critical databases. These systems today are best hosted on NAND flash and Ive been talking recently about two AWS solutions to address this storage class: I/O Performance (no longer) Sucks in the Cloud EBS Provisioned IOPS & Storage Optimized EC2 Instance Types Cold storage is different.
Fun with Energy Consumption Data
Facebook recently released a detailed report on their energy consumption and carbon footprint: Facebooks Carbon and Energy Impact. Facebook has always been super open with the details behind there infrastructure. For example, they invited me to tour the Prineville datacenter just prior to its opening: · Open Compute Project · Open Compute Mechanical System Design · Open Compute Server Design · Open Compute UPS & Power Supply Reading through the Facebook Carbon and Energy Impact page, we see they consumed 532 million kWh of energy in 2011 of which 509m kWh went to their datacenters.
EBS Provisioned IOPS & Optimized Instance Types
In I/O Performance (no longer) Sucks in the Cloud, I said Many workloads have high I/O rate data stores at the core. The success of the entire application is dependent upon a few servers running MySQL, Oracle, SQL Server, MongoDB, Cassandra, or some other central database. Last week a new Amazon Elastic Compute Cloud (EC2) instance type based upon SSDs was announced that delivers 120k reads per second and 10k to 85k writes per second. This instance type with direct attached SSDs is an incredible I/O machine ideal for database workloads, but most database workloads run on virtual storage today.
I/O Performance (no longer) Sucks in the Cloud
Many workloads have high I/O rate data stores at the core. The success of the entire application is dependent upon a few servers running MySQL, Oracle, SQL Server, MongoDB, Cassandra, or some other central database. The best design patter for any highly reliable and scalable application whether on-premise or in cloud hosted, is to shard the database. You cant be dependent upon a single server being able to scale sufficiently to hold the entire workload. Theoretically, thats the solution and all workloads should run well on a sufficiently large fleet even if that fleet has a low individual server I/O performance.
Why there are Datacenters in NY, Hong Kong, and Tokyo?
Why are there so many data centers in New York, Hong Kong, and Tokyo? These urban centers have some of the most expensive real estate in the world. The cost of labor is high. The tax environment is unfavorable. Power costs are high. Construction is difficult to permit and expensive. Urban datacenters are incredibly expensive facilities and yet a huge percentage of the worlds computing is done in expensive urban centers. One of my favorite examples is the 111 8th Ave data center in New York. Google bought this datacenter for $1.9B. They already have facilities on the Columbia river where the power and land are cheap.
Official Report of the Fukushima Nuclear Accident Independent Investigation Commission Executive
Last night, Tom Klienpeter sent me The Official Report of the Fukushima Nuclear Accident Independent Investigation Commission Executive Summary. They must have hardy executives in Japan in that the executive summary runs 86 pages in length. Overall, Its an interesting document but I only managed to read in to the first page before starting to feel disappointed. What I was hoping for is a deep dive into why the reactors failed, the root causes of the failures, and what can be done to rectify it. Because of the nature of my job, Ive spent considerable time investigating hardware and software system failures and what I find most difficult and really time consuming is getting to the real details.
Visiting the Hanjin Oslo Container Ship
The NASCAR Sprint Cup Stock Car Series kicks its season off with a bang and, unlike other sports, starts the season off with the biggest event of the year rather than closing with it. Daytona Speed Weeks is a multi-week, many race event the finale of which is the Daytona 500. The 500 starts with a huge field of 43 cars and is perhaps famous for some of the massive multi-car wrecks. The 17 car pile-up of 2011, made a 43 card field look like the appropriate amount of redundancy just to get a car over the finish line at the end. <
Temprature Management in Data Centers
Cooling is the largest single non-IT (overhead) load in a modern datacenter. There are many innovative solutions to addressing the power losses in cooling systems. Many of these mechanical system innovations work well and others have great potential but none are as powerful as simply increasing the server inlet temperatures. Obviously less cooling is cheaper than more. And, the higher the target inlet temperatures, the higher percentage of time that a facility can spend running on outside air (air-side economization) without process-based cooling. The downsides of higher temperatures are 1) high semiconductor leakage losses, 2) higher server fan speed which increases the losses to air moving, and 3) higher server mortality rates.
Urs Holzle @ Open Networking Summit 2012
Untitled 1 p.MsoNormal {margin-bottom:.0001pt; font-size:11.0pt; font-family:"Calibri","sans-serif"; margin-left: 0in; margin-right: 0in; margin-top: 0in; } a:link {color:blue; text-decoration:underline; text-underline:single; } p.MsoListParagraph {margin-top:0in; margin-right:0in; margin-bottom:0in; margin-left:.5in; margin-bottom:.0001pt; font-size:11.0pt; font-family:"Calibri","sans-serif"; } Urs Holzle did the keynote talk at the 2012 Open Networking Summit where he focused on Software Defined Networking in Wide Area Networking. Urs leads the Technical Infrastructure group at Google where he is Senior VP and Technical Fellow. Software defined networking (SDN) is the central management of networking routing decisions rather than depending upon distributed routing algorithms running semi-autonomously on each router. Essentially what is playing out in the networking world is a replay of what we have seen in the server world across many dimensions.
Amazon Web Services
Most of the time I write about the challenges posed by scaling infrastructure. Today, though, I wanted mention some upcoming events that have to do with a different sort of scale. In Amazon Web Services we are tackling lots of really hairy challenges as we build out one the worlds largest cloud computing platforms. From data center design, to network architecture, to data persistence, to high-performance computing and beyond we have a virtually limitless set of problems needing to be solved. Over the coming years AWS will be blazing new trails in virtually every aspect of computing and infrastructure. In order to tackle these opportunities we are searching for innovative technologists to join the AWS team. In other words we need to scale our engineering staff. AWS has hundreds of open positions throughout the organization. Every single AWS team is hiring including EC2, S3, EBS, EMR, CloudFront, DynamoDB and even ...
Power Management of Online Data-Intensive Services
I met Googles Wolf-Dietrich Weber at the 2009 CIDR conference where he presented what is still one of my favorite datacenter power-related papers. I liked the paper because the gain was large, the authors werent confused or distracted by much of what is incorrectly written on datacenter power consumption, and the technique is actually practical. In Power Provisioning for a Warehouse-sized Computer, the authors argue that we should oversell power, the most valuable resource in a data center. Just as airlines oversell seats, their key revenue producing asset, datacenter operators should oversell power. Most datacenter operators take the critical power, the total power available to the data center less power distribution losses and mechanical system cooling loads, then reduce it by at least 10 to 20% to protect against the risk of overdraw which can draw penalty or power loss.
I Love Solar Power But...
I love solar power, but in reflecting carefully on a couple of high profile datacenter deployments of solar power, Im really developing serious reservations that this is the path to reducing data center environmental impact. I just cant make the math work and find myself wondering if these large solar farms are really somewhere between a bad idea and pure marketing, where the environmental impact is purely optical. Facebook Prineville The first of my two examples is the high profile installation of a large solar array at the Facebook Prineville Oregon Facility.
Observations on Errors, Corrections, & Trust of Dependent Systems
Every couple of weeks I get questions along the lines of should I checksum application files, given that the disk already has error correction? or given that TCP/IP has error correction on every communications packet, why do I need to have application level network error detection? Another frequent question is non-ECC mother boards are much cheaper -- do we really need ECC on memory? The answer is always yes. At scale, error detection and correction at lower levels fails to correct or even detect some problems. Software stacks above introduce errors. Hardware introduces more errors. Firmware introduces errors. Errors creep in everywhere and absolutely nobody and nothing can be trusted.
Communicating Data Beyond the Speed of Light
In the past, Ive written about the cost of latency and how reducing latency can drive more customer engagement and increase revenue. Two example of this are: 1) The Cost of Latency and 2) Economic Incentives applied to Web Latency. Nowhere is latency reduction more valuable than in high frequency trading applications. Because these trades can be incredibly valuable, the cost of the infrastructure on which they trade is more or less an afterthought. Good people at the major trading firms work hard to minimize costs but, if the cost of infrastructure was to double tomorrow, high frequency trading would continue unabated.
Perspectives on the Costa Concordia Incident
Last week I wrote up Studying the Costa Concordia Grounding. Many folks sent me mail with interesting perspectives. Two were sufficiently interesting that I wanted to repeat them here. The first was from someone who was actually on the ship on that final cruise. The latter is from a professional captain with over 35 years experience as a certified Ocean Master. Experiences From a Costa Concordia Passenger One of the engineers I work with at Amazon was actually on the Costa Concordia when it grounded. Rory Browne works in the Amazon.com Dublin office and he made an excellent and very detailed presentation on what took place that final trip. He asked me not to post his slides but OK me posting my notes from his presentation.
Studying the Costa Concordia Grounding
Don't be a show-off. Never be too proud to turn back. There are old pilots and bold pilots, but no old, bold pilots.< ?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /> I first heard the latter part of this famous quote made by US Airmail Pilot E. Hamilton Lee back when I raced cars. At that time, one of the better drivers in town, Gordon Monroe, used a variant of that quote (with pilots replaced by racers) when giving me driving advice. Gords basic message was that it is impossible to win a race if you crash out of it.
Socrata Chief Technical Officer
Ordinarily I focus this blog on areas of computing where I spend most of my time from high performance computing to database internals and cloud computing. An area that interests me greatly but Ive seldom written about is entrepreneurship and startups. One of the Seattle areas startups with which I stay in touch is Socrata. They are focused on enabling federal, state, and local governments to improve the reach, usability and social utility of their public information assets. Essentially making public information available and useful to their constituents. They are used by: the World Bank, the United Nations, the World Economic Forum, the US Data.Gov, Health & Human Services, Centers for Disease Control, several most major cities including NYC, Seattle, Chicago, San Francisco and Austin and many county and state governments.
Amazon DynamoDB: NoSQL in the Cloud
Finally! Ive been dying to talk about DynamoDB since work began on this scalable, low-latency, high-performance NoSQL service at AWS. This morning, AWS announced availability of DynamoDB: Amazon Web Services Launches Amazon DynamoDB A New NoSQL Database Service Designed for the Scale of the Internet. In a past blog entry, One Size Does Not Fit All, I offered a taxonomy of 4 different types of structured storage system, argued that Relational Database Management Systems are not sufficient, and walked through some of the reasons why NoSQL databases have emerged and continue to grow market share quickly.
Innovative Datacenter Design: Ishikari Datacenter
Occasionally I come across a noteworthy datacenter design that is worth covering. Late last year a very interesting Japanese facility was brought to my attention by Mikio Uzawa an IT consultant who authors the Agile Cat blog. I know Mikio because he occasionally translates Perspectives articles for publication in Japan. Mikio pointed me to the Ishikari Datacenter in Ishikari City, Hokkaido Japan. Phase I of this facility was just completed in November 2011. This facility is interesting for a variety of reasons but the design features I found most interesting are: 1) High voltage direct current power distribution, 2) whole building ductless cooling, and 3) aggressive free air cooling.
ARM V8 Architecture
Years ago, Dave Patterson remarked that most server innovations were coming from the mobile device world. Hes right. Commodity system innovation is driven by volume and nowhere is there more volume than in the mobile device world. The power management techniques applied fairly successfully over the last 5 years had their genesis in the mobile world. And, as processor power efficiency improves, memory is on track to become the biggest power consumer in the data center. I expect the ideas to rein in memory power consumption will again come from the mobile device world. Just as Eskimos are reported (apparently incorrectly) to have 7 words for snow, mobile memory systems have a large array of low power states with subtly different power dissipations and recovery times.
Hyder: Transactional Indexed Record Manager for Shared Flash Storage
If you work in the database world, you already know Phil Bernstein. Hes the author of Principles of Transaction Processing and has a long track record as a successful and prolific database researcher. Past readers of this blog may remember Phils guest blog posting on Google Megastore. Over the past few years, Phil has been working on an innovative NoSQL system based upon flash storage. I like the work because it pushes the limit of what can be done on a single server with transaction rates approaching 400,000, leverages the characteristics of flash storage in a thought provoking way, and employs interesting techniques such as log-only storage.
High Availability for Cloud Computing Database Systems
While at Microsoft I hosted a weekly talk series called the Enterprise Computing Series (ECS) where I mostly scheduled technical talks on server and high-scale service topics. I said mostly because the series occasionally roamed as far afield as having an ex-member of the Ferrari Formula 1 team present. Client-side topics are also occasionally on the list either because I particularly liked the work or technology behind it or thought it was a broadly relevant topic. The Enterprise Computing Series has an interesting history. It was started by Jim Gray at Tandem. Pat Helland picked up the mantle from Jim and ran it for years before Pat moved to Andy Hellers Hal Computer Systems.
Global Netflix Platform: Large Scale Java PaaS Running on AWS
Netflix is super interesting in that they are running at extraordinary scale, are a leader in the move to the cloud, and Adrian Cockcroft, the Netflix Director of Cloud Architecture, is always interesting in presentations. In this presentation Adrian covers similar material to his HPTS 2011 talk I saw last month. His slides are up at: http://www.slideshare.net/adrianco/global-netflix-platform and my rough notes follow: · Netflix has 20 milion streaming members o Currently in US, Canada, and Latin America o Soon to be in UK and Ireleand · Netflix is 100% public cloud hosted · Why did Netflix move from their own high-scale facility to a public cloud?<
Free Lessons in Industrial Design & Customer Experience
I seldom write consumer product reviews and this blog is about the furthest thing from a consumer focused site but, every so often, I come across a notable tidbit that is worthy of mention. A few weeks ago, it was Sprint unilaterally changing the terms of their wireless contracts (Sprint is Giving Free Customer Service Lessons). It just seemed a sufficiently confused decision that it was worthy of mention. Heres one that just nails it on the other side of the equation by obsessing over the customer experience: Roku. Ive long known about Roku but Im not a huge TV watcher so Ive only been peripherally interested in the product.
42: The Answer to the Ultimate Question of Life, the Universe, and Everything
Yesterday the Top 500 Supercomputer Sites was announced. The Top500 list shows the most powerful commercially available supercomputer systems in the world. This list represents the very outside of what supercomputer performance is possible when cost is no object. The top placement on the list is always owned by a sovereign funded laboratory. These are the systems that only government funded agencies can purchase. But they have great interest for me because, as the cost of computing continues to fall, these performance levels will become commercially available to companies wanting to run high scale models and data intensive computing. In effect, the Top500 predicts the future so Im always interested in the systems on the list. <
AWS Startup Challenge 2011
Last week I got to participate in one of my favorite days each year, serving on the judging panel for the AWS Startup Challenge. The event is a fairly intense day where our first meeting starts at 7:45am and the event closes at 9pm that evening. But it is an easy day to love in that the entire day is spent with innovative startups who have built their companies on cloud computing. Im a huge believer in the way cloud computing is changing the computing landscape and thats all Ive worked on for many years now.
Serious Hard Drive Shortage Expected for at Least 2 Quarters
As rescue and relief operations continue in response to the serious flooding in Thailand the focus has correctly been on human health and safety. Early reports estimated 317 fatalities, 700,000 homes and 14,000 factories impacted with over 660,000 not able to work. Good coverage mostly from the Bangkok Post is available at Newley.com authored by a reporter in the regoin. For example: http://newley.com/2011/11/02/thailand-flooding-update-november-2-2011-front-page-of-todays-bangkok-post/. The floods are far from over and, as we look beyond the immediate problem in country, the impact on the technology world is expected to continue for just over a year even if the floods do recede in 3 to 4 weeks as expected.
Presenting Tomorrow at University of Washington
Im not sure why it all happens at once but it often does. Last Monday I kicked off HPTS 2011 in Asilomar California and then flew to New York City to present at the Open Compute Summit. I love HPTS. Its a once every 2 year invitational workshop that Ive been attending since 1989. The workshop attracts a great set of presenters and attendees: HPTS 2011 agenda. I blogged a couple of the sessions if you are interested: · Microsoft COSMOS at HPTS · Storage Infrastructure Behind Facebook Message The Open Compute Summit was kicked off by Frank Frankovsky of Facebook followed by the legendary Andy Bechtolsheim of Arista Networks. I did a talk after Andy which was a subset of the talk I had done earlier in the week at HPTS.
Sprint is Giving Free Customer Service Lessons
Sometimes the most educational lessons are on what not to do rather than what to do. Failure and disaster can be extraordinarily educational as long as the reason behind the failure is well understood. I study large system outages, infrastructure failures, love reading post mortems (when they actually have content), and always watch carefully how companies communicate with their customers during and right after large scale customer impacting events. I dont do it because I enjoy failure these things all scare me. But, in each there are lessons to be learned. Sprint advertising from: http://unlimited.sprint.com/?pid=10 (2011/10/29).
Storage Infrastructure Behind Facebook Messages
One of the talks that I particularly enjoyed yesterday at HPTS 2011 was Storage Infrastructure Behind Facebook Messages by Kannan Muthukkaruppan. In this talk, Kannan talked about the Facebook store for chats, email, SMS, & messages. This high scale storage system is based upon HBase and Haystack. HBase is a non-relational, distributed database very similar to Googles Big Table. Haystack is simple file system designed by Facebook for efficient photo storage and delivery. More on Haystack at: Facebook Needle in a Haystack. In this Facebook Message store, Haystack is used to store attachments and large messages. HBase is used for message metadata, search indexes, and small messages (avoiding the second I/O to Haystack for small messages like most SMS).
Microsoft COSMOS at HPTS
Rough notes from a talk on COSMOS, Microsofts internal Map reduce systems from HPTS 2011. This is the service Microsoft uses internally to run MapReduce jobs. Interesting, Microsoft plans to use Hadoop in the external Azure service even though COSMOS looks quite good: Microsoft Announces Open Source Based Cloud Service.
Software Define Networking Comes of Age
From the Last Bastion of Mainframe Computing Perspectives post: The networking equipment world looks just like mainframe computing ecosystem did 40 years ago. A small number of players produce vertically integrated solutions where the ASICs (the central processing unit responsible for high speed data packet switching), the hardware design, the hardware manufacture, and the entire software stack are stack are single sourced and vertically integrated. Just as you couldnt run IBM MVS on a Burrows computer, you cant run Cisco IOS on Juniper equipment. When networking gear is purchased, its packaged as a single sourced, vertically integrated stack.
EMCs Joe Tucci on the Cloud and Big Data
Last night EMC Chief Executive Joe Tucci laid out his view of where the information processing world is going over the next decade and where EMC will focus. His primary point was cloud computing is the future and big data is the killer app for the cloud. He laid out the history of big transitions in our industry and argued the big discontinuities were always driven by a killer application. He sees the cloud as the next big and important transition for our industry. This talk was presented as part of the University of Washington Distinguished Lecturer Series.
Microsoft Announces Open Source based Cloud Service
We see press releases go by all the time and most of them deserve the yawn they get. But, one caught my interest yesterday. At the PASS Summit conference Microsoft Vice President Ted Kummert announced that Microsoft will be offering a big data solution based upon Hadoop as part of SQL Azure. From the Microsoft press release, Kummert also announced new investments to help customers manage big data, including an Apache Hadoop-based distribution for Windows Server and Windows Azure and a strategic partnership with Hortonworks Inc. Clearly this is a major win for the early startup Hortonworks.
We lost a Giant
Earlier today we lost one of the giants of technology. Steve Jobs was one of most creative, demanding, brilliant, hard-driving, and innovative leaders in the entire industry. He has created new business areas, introduced new business models, brought companies back from the dead, and fundamentally changed how the world as a whole interacts with computers. He was a visionary of staggering proportions with an unusual gift in his ability to communicate a vision and also the drive to seek perfection in the execution of his ideas. We lost a giant today.
Changes in Networking Systems
Ive been posting frequently on networking issues with the key point being the market is on the precipice of a massive change. There is a new model emerging. · Datacenter Networks are in my way · Networking: The Last Bastion of Mainframe Computing We now have merchant silicon providers for the core Application Specific Integrated Circuits (ASICs) that form the core network switches and routers including Broadcom, Fulcrum (recently purchased by Intel), Marvell, Dune (purchased by Broadcom). We have many competing offerings for the control processor that supports the protocol stack including Freescale, Arm, and Intel.
Spot Instances, Big Clusters, & the Cloud at Work
If you read this blog in the past, youll know I view cloud computing as a game changer (Private Clouds are not the Future) and spot instances as a particularly powerful innovation within cloud computing. Over the years, Ive enumerated many of the advantages of cloud computing over private infrastructure deployments. A particularly powerful cloud computing advantage is driven by noting that when combining a large number of non-correlated workloads, the overall infrastructure utilization is far higher for most workload combinations. This is partly because the reserve capacity to ensure that all workloads are able to support peak workload demands is a tiny fraction of what is required to provide reserve surge capacity for each job individually.
Hortonworks Taking Hadoop to Next Level
I got a chance to chat with Eric Baldeschwieler while he was visiting Seattle a couple of weeks back and catch up on whats happening in the Hadoop world at Yahoo and beyond. Eric recently started Hortonworks whose tag line is architecting the future of big data. Ive known Eric for years when he led the Hadoop team at Yahoo! most recently as VP of Hadoop Engineering. It was Erics team at Yahoo that contributed much of the code in Hadoop, Pig, and ZooKeeper. Many of that same group form the core of Hortonworks whose mission is revolutionize and commoditize the storage and processing of big data via open source.
SolidFire: Cloud Operators Becomes a Market
Its a clear sign that the Cloud Computing market is growing fast and the number of cloud providers is expanding quickly when startups begin to target cloud providers as their primary market. Its not unusual for enterprise software companies to target cloud providers as well as their conventional enterprise customers but Im now starting to see startups building products aimed exclusively at cloud providers. Years ago when there were only a handful of cloud services, targeting this market made no sense. There just werent enough buyers to make it an interesting market. And, many of the larger cloud providers are heavily biased to internal development further reducing the addressable market size.
Consolidation in Networking: Intel Buys Fulcrum Microsystems
Great things are happening in the networking market. Were transitioning from vertically integrated mainframe-like economics to a model similar to what we have in the server world. In the server ecosystems, we have Intel, AMD and others competing to provide the CPU. At the layer above, we have ZT Systems, HP, Dell DCS, SGI, IBM, and many others building servers based upon whichever CPU the customer choses to use. Above that layer, we have wide variety of open sources and proprietary software stacks that run on servers from any of the providers which include any of the CPU providers silicon. There is competition at all layers in the stack.
SIGMOD 2011 in Athens
Earlier this week, I was in Athens Greece attending annual conference of the ACM Machinery Special Interest Group on Management of Data. SIGMOD is one of the top two database events held each year attracting academic researchers and leading practitioners from industry. I kicked off the conference with the Plenary keynote. In this talk I started with a short retrospection on the industry over the last 20 years. In my early days as a database developer, things were moving incredibly quickly. Customers were loving our products, the industry was growing fast and yet the products really weren't all that good.
Amazon Technology Open House
The Amazon Technology Open House was held Tuesday night at the Amazon South Lake Union Campus. I did a short presentation on the following: ¿ Quickening pace of infrastructure innovation ¿ Where does the money go? ¿ Power distribution infrastructure ¿ Mechanical systems ¿ Modular & Advanced Building Designs ¿ Sea Change in Networking The slides are posted at: http://mvdirona.com/jrh/TalksAndPapers/JamesHamilton_AmazonOpenHouse20110607.pdf
Atul Gawande on Performance
Earlier today Alex Mallet reminded me of the excellent writing of Atul Gawnade by sending me a pointer to the New Yorker coverage of Gawande's commencement address at the Harvard Medical School: Cowboys and Pit Crews. Four years ago I wrote a couple of blog entries on Gawande's work but, at the time, my blog was company internal so I've not posted these notes here in the past: As a follow-on to the posting I made on professional engineering (also posted externally http://perspectives.mvdirona.com/2007/11/07/ProfessionalEngineering.aspx) Edwin Young sent me a link to the following talk by Atul Gawande: Outcomes are very Personal.
What Went Wrong at Fukushima Dai-1
As a boater, there are times when I know our survival is 100% dependent upon the weather conditions, the boat, and the state of its equipment. As a consequence, I think hard about human or equipment failure modes and how to mitigate them. I love reading the excellent reporting by the UK Marine Accident Investigation Board. This publication covers human and equipment related failures on commercial shipping, fishing, and recreational boats. I read it carefully and I've learned considerably from it. I treat my work in much the same way. At work, human life is not typically at risk but large service failures can be very damaging and require the same care to avoid.
2011 European Data Center Summit
The European Data Center Summit 2011 was held yesterday at SihlCity CinCenter in Zurich. Google Senior VP Urs Hoelzle kicked off the event talking about why data center efficiency was important both economically and socially. He went on to point out that the oft quoted number that US data centers represent is 2% of total energy consumption is usually mis-understood. The actual data point is that 2% of the US energy budget is spent on IT of which the vast majority is client side systems. This is unsurprising but a super important clarification. The full breakdown of this data: · 2% of US power o Datacenters: 14% o Telecom: 37% o Client Device: 50% The net is that 14% of 2% or 0.28% of the US power budget is consumed ...
Guido van Rossum: 21 Years of Python
Guido van Rossum was at Amazon a week back doing a talk. Guido presented 21 Years of Python: From Pet Project to Programming Language of the Year. The slides are linked below and my rough notes follow: · Significant Python influencers: o Algol 60, Pascal, C o ABC o Modula0-2+ and 3 o Lisp and Icon · ABC was the strongest language influencer of this set · ABC design goals:
Software Load Balancing using Software Defined Networking
I invited Nikhil Handigol to present at Amazon earlier this week. Nikhil is a Phd candidate at Stanford University working with networking legend Nick McKeown on the Software Defined Networking team. Software defined networking is an concept coined by Nick where the research team is separating the networking control plane from the data plane. The goal is a fast and dumb routing engine with the control plane factored out and supporting an open programming platform. From Nikil's presentation, we see the control plane hoisted up to a central, replicated network O/S configuring the distributed routing engines in each switch.
Open Compute UPS & Power Supply
This note looks at the Open Compute Project distributed Uninterruptable Power Supply (UPS) and server Power Supply Unit (PSU). This is the last in a series of notes looking at the Open Compute Project. Previous articles include: · Open Compute Project · Open Compute Server Design · Open Compute Mechanical Design The open compute uses a semi-distributed uninterruptable power supply (UPS) system. Most data centers use central UPS systems where large the UPS is part of the central power distribution system. In this design, the UPS is in the 480 3 phase part of the central power distribution system prior to the step down to 208VAC.
European Data Center Efficiency Summit
Google cordially invites you to participate in a European Summit on sustainable Data Centres. This event will focus on energy-efficiency best practices that can be applied to multi-MW custom-designed facilities, office closets, and everything in between. Google and other industry leaders will present case studies that highlight easy, cost-effective practices to enhance the energy performance of Data Centres. The summit will also include a dedicated session on cooling. Presenters will detail climate-specific implementations of free cooling as well as novel ways to utilise locally -available opportunities. We will also debate climate-independent PUE targets. The agenda includes presentations and panel discussions featuring Amazon, DeepGreen, eBay, Google, IBM, Microsoft, Norman Disney & Young, PlusServer, Telecity Group, The Green Grid, UK's Chartered Institute for IT, UBS and others.
Open Compute Server Design
Last Thursday Facebook announced the Open Compute Project where they released pictures and specifications for their Prineville Oregon datacenter and the servers and infrastructure that will populate that facility. In my last blog, Open Compute Mechanical System Design I walked through the mechanical system in some detail. In this posting, we'll have a closer look at the Facebook Freedom Server design. Chassis Design: The first thing you'll notice when looking at the Facebook chassis design is there are only 30 servers per rack. They are challenging one of the strongest held beliefs in the industry that is density is the primary design goal and more density is good.
Open Compute Mechanical System Design
Last week Facebook announced the Open Compute Project (Perspectives, Facebook). I linked to the detailed specs in my general notes on Perspectives and said I would follow up with more detail on key components and design decisions I thought were particularly noteworthy. In this post we'll go through the mechanical design in detail. As long time readers of this blog will know, PUE has many issues (PUE is still broken and I still use it) and is mercilessly gamed in marketing literature (PUE and tPUE). The Facebook published literature predicts that this center will deliver a PUE of 1.07.
Open Compute Project
The pace of innovation in data center design has been rapidly accelerating over the last 5 years driven by the mega-service operators. In fact, I believe we have seen more infrastructure innovation in the last 5 years than we did in the previous 15. Most very large service operators have teams of experts focused on server design, data center power distribution and redundancy, mechanical designs, real estate acquisition, and network hardware and protocols. But, much of this advanced work is unpublished and practiced at a scale that is hard to duplicate in a research setting.
Example of Efficient Mechanical Design
A bit more than a year back, I published Computer Room Evaporative Cooling where I showed an evaporative cooling design from EcoCooling. Periodically, Alan Beresford sends me designs he's working on. This morning he sent me a design they are working on for a 7MW data center in Ireland. I like the design for a couple of reasons: 1) It's a simple design and efficient design, and 2) it's a nice example of a few important industry trends. The trends exemplified by this design are: 1) air-side economization, 2) evaporative cooling, 3) hot-aisle containment, and 4) very large plenums with controlled hot-air recycling.
Prioritizing Principlas in "On Designing and Deploying Internet-Scale Services"
Brad Porter is Director and Senior Principal engineer at Amazon. We work in different parts of the company but I have known him for years and he's actually one of the reasons I ended up joining Amazon Web Services. Last week Brad sent me the guest blog post that follows where, on the basis of his operational experience, he prioritizes the most important points in the Lisa paper On Designing and Deploying Internet-Scale Services. --jrh Prioritizing the Principles in "On Designing and Deploying Internet-Scale Services" By Brad Porter James published what I consider to be the single best paper to come out of the highly-available systems world in many years.
Intel Atom with ECC in 2012
Back in early 2008, I noticed an interesting phenomena: some workloads run more cost effectively on low-cost, low-power processors. The key observation behind this phenomena is that CPU bandwidth consumption is going up faster than memory bandwidth. Essentially, it's a two part observation: 1) many workloads are memory bandwidth bound and will not run faster with a faster processor unless the faster processor comes with a much improved memory subsystem and 2) the number of memory bound workloads is going up overtime. One solution is improve both the memory bandwidth and the processor speed and this really does work but it is expensive.
More Data on Datacenter Air Side Economization
Two of the highest leverage datacenter efficiency improving techniques currently sweeping the industry are: 1) operating at higher ambient temperatures (http://perspectives.mvdirona.com/2011/02/27/ExploringTheLimitsOfDatacenterTemprature.aspx) and air-side economization with evaporative cooling (http://perspectives.mvdirona.com/2010/05/15/ComputerRoomEvaporativeCooling.aspx). The American Society of Heating and Refrigeration, and Air-Conditioning Engineers (ASHRAE) currently recommends that servers not be operated at inlet temperatures beyond 81F. Its super common to hear that every 10C increase in temperatures leads to 2x the failure ¿ some statements get repeated so frequently they become “true” and no longer get questioned. See Exploring the limits of Data Center Temperature for my argument that this rule of thumb doesn't apply over the full range operating temperatures.
Yahoo! Compute Coop Design
Chris Page, Director of Climate & Energy Strategy at Yahoo! spoke at the 2010 Data Center Efficiency Summit where he presented Yahoo! Compute Coop Design. The primary attributes of the Yahoo! design are: 1) 100% free air cooling (no chillers), 2) slab concrete floor, 3) use of wind power to augment air handling units, and 4) pre-engineered building for construction speed. Chris reports the idea to orient the building such that the wind force on the external wall facing the dominant wind direction and use this higher pressure to assist the air handling units was taken from looking at farm buildings in the Buffalo, New York area.
Exploring the Limits of Datacenter Temprature
Datacenter temperature has been ramping up rapidly over the last 5 years. In fact, leading operators have been pushing temperatures up so quickly that the American Society of Heating, Refrigeration, and Air-Conditioning recommendations have become a become trailing indicator of what is being done rather than current guidance. ASHRAE responded in January of 2009 by raising the recommended limit from 77F to 80.6F (HVAC Group Says Datacenters Can be Warmer). This was a good move but many of us felt it was late and not nearly a big enough increment. Earlier this month, ASHRAE announced they are again planning to take action and raise the recommended limit further but haven't yet announced by how much (ASHRAE: Data Centers Can be Even Warmer). Many datacenters are operating reliably well in excess even the newest ASHRAE recommended temp of 81F.
Dileep Bhandarkar on Datacenter Energy Efficiency
Dileep Bhandarkar presented the keynote at Server design Summit last December. I can never find the time to attend trade shows so I often end up reading slides instead. This one had lots of interesting tidbits so I'm posting a pointer to the talk and my rough notes here.
Challenges and Trade-offs in Building a Web-scale Real-time Analytics System
Ben Black always has interesting things on the go. He's now down in San Francisco working on his startup Fastip which he describes as “an incredible platform for operating, exploring, and optimizing data networks.” A couple of days ago Deepak Singh sent me to a recent presentation of Ben's I found interesting: Challenges and Trade-offs in Building a Web-scale Real-time Analytics System. < ?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /> The problem described in this talk was “Collect, index, and query trillions of high dimensionality records with seconds of latency for ingestion and response.” What Ben is doing is collecting per flow networking data with tcp/ip 11-tuples (src_mac, dst_mac, src_IP, dest_IP, ¿) as the dimension data and, as metrics, he is tracking start usecs, end usecs, packets, octets, and UID.
Speeding Up Cloud/Server Applications With Flash Memory
Last week, Sudipta Sengupta of Microsoft Research dropped by the Amazon Lake Union campus to give a talk on the flash memory work that he and the team at Microsoft Research have been doing over the past year. Its super interesting work. You may recall Sudipta as one of the co-authors on the VL2 Paper (VL2: A Scalable and Flexible Data Center Network) I mentioned last October. < ?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /> Sudipta's slides for the flash memory talk are posted at Speeding Up Cloud/Server Applications With Flash Memory and my rough notes follow: · Technology has been used in client devices for more than a decade · Server side usage more recent and the difference between hard disk drive and flash characterizes brings some challenges that need to be managed in the on-device Flash Translation Layer (FTL) or in ...
NVIDIA Project Denver: ARM Powered Servers
NVIDIA has been an ARM licensee for quite some time now. Back in 2008 they announced Tegra, an embedded client processor including an ARM core and NVIDIA graphics aimed at smartphones and mobile handsets. 10 days ago, they announced Project Denver where they are building high-performance ARM-based CPUs, designed to power systems ranging from “personal computers and servers to workstations and supercomputers”. This is interesting for a variety of reasons, first they are entering the server CPU market. Second NVIDIA is joining Marvell and Calxeda (previously Smooth-Stone) in taking the ARM architecture and targeting server-side computing. < ?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /> ARM is an interesting company in that they produce designs and these designs get adapted by licensees including Texas instruments, Samsung, Qualcomm, and even unlikely players such as Microsoft.
Interested in Core Database Engine Development?
If you have experience in database core engine development either professionally, on open source, or at university send me your resume. When I joined the DB world 20 years ago, the industry was young and the improvements were coming ridiculously fast. In a single release we improved DB2 TPC-A performance by a factor of 10x. Things were changing quickly industry-wide. These days single-server DBs are respectably good. It's a fairly well understood space. Each year more features are added and a few percent performance improvement may happen but the code bases are monumentally large, many of the development teams are over 1,000 engineers, and things are happening anything but quickly.
Google Megastore: The Data Engine Behind GAE
Megastore is the data engine supporting the Google Application Engine. It's a scalable structured data store providing full ACID semantics within partitions but lower consistency guarantees across partitions. I wrote up some notes on it back in 2008 Under the Covers of the App Engine Datastore and posted Phil Bernstein's excellent notes from a 2008 SIGMOD talk: Google Megastore. But there has been remarkably little written about this datastore over the intervening couple of years until this year's CIDR conference papers were posted. CIDR 2011 includes Megastore: Providing Scalable, Highly Available Storage for Interactive Services.
GPGPU Sorting
Years ago I believed that incorrectly believed special purpose hardware was a bad idea. What was a bad idea is high-markup, special purpose devices sold at low volume, through expensive channels. Hardware implementations are often best value measured in work done per dollar and work done per joule. The newest breed of commodity networking parts from Broadcom, Fulcrum, Dune (now Broadcom), and others is a beautiful example of Application Specific Integrated Circuits being the right answer for extremely hot code kernels that change rarely. I've long been interested in highly parallel systems and in heterogeneous processing.
Amazon Route 53 DNS Service
Even working in Amazon Web Services, I'm finding the frequency of new product announcements and updates a bit dizzying. It's amazing how fast the cloud is taking shape and the feature set is filling out. Utility computing has really been on fire over the last 9 months. I've never seen an entire new industry created and come fully to life this fast. Fun times. Before joining AWS, I used to say that I had an inside line on what AWS was working upon and what new features were coming in the near future. My trick?
Availability in Globally Distributed Storage Systems
I love high-scale systems and, more than anything, I love data from real systems. I've learned over the years that no environment is crueler, less forgiving, or harder to satisfy than real production workloads. Synthetic tests at scale are instructive but nothing catches my attention like data from real, high-scale, production systems. Consequently, I really liked the disk population studies from Google and CMU at FAST2007 (Failure Trends in a Large Disk Population, Disk Failures in the Real World: What does a MTBF of 100,000 hours mean to you). These two papers presented actual results from independent production disk populations of 100,000 each.
46MW with Water Cooling at a PUE of 1.10
Achieving a PUE of 1.10 is a challenge under any circumstances but the vast majority of facilities that do approach this mark are using air-side economization. Essentially using outside air to cool the facility. Air-side economization brings some complexities such as requiring particulate filters and being less effective in climates that are both hot and humid. Nonetheless, even with the challenges, air-side economization is one of the best techniques, if not the best, of improving datacenter efficiency. As a heat transport water is both effective and efficient. The challenges of using water in open-circuit datacenter cooling designs are largely social and regulatory.
Very Low-Cost, Low-Power Servers
I'm interested in low-cost, low-power servers and have been watching the emerging market for these systems since 2008 when I wrote CEMS: Low-Cost, Low-Power Servers for Internet Scale Services (paper, talk). ZT Systems just announced the R1081e, a new ARM-based server with the following specs: · STMicroelectronics SPEAr 1310 with dual ARM® Cortex¿-A9 cores · 1 GB of 1333MHz DDR3 ECC memory embedded · 1 GB of NAND Flash · Ethernet connectivity · USB ·
Wed, 17 Nov 2010 14:15:32 UTC
Earlier this week Clay Magouyrk sent me a pointer to some very interesting work: A Couple More Nails in the Coffin of the Private Compute Cluster: Benchmarks for the Brand New Cluster GPU Instance on Amazon EC2. This detailed article has detailed benchmark results from runs on the new Cluster GPU Instance type and leads in with: During the past few years it has been no secret that EC2 has been best cloud provider for massive scale, but loosely connected scientific computing environments. Thankfully, many workflows we have encountered have performed well within the EC2 boundaries.
GPU Clusters in 10 Minutes
HPC in the Cloud with GPGPUs
A year and half ago, I did a blog post titled heterogeneous computing using GPGPUs and FPGA. In that note I defined heterogeneous processing as the application of processors with different instruction set architectures (ISA) under direct application programmer control and pointed out that this really isn't all that new a concept. We have had multiple ISAs in systems for years. IBM mainframes had I/O processors (Channel I/O Processors) with a different ISA than the general CPUs , many client systems have dedicated graphics coprocessors, and floating point units used to be independent from the CPU instruction set before that functionality was pulled up onto the chip.
AWS Compute Cluster #231 on Top 500
The Top 500 Super Computer Sites list was just updated and AWS Compute Cluster is now officially in the top ½ of the list.
Datacenter Power Efficiency
Kushagra Vaid presented at Datacenter Power Efficiency at Hotpower '10. The URL to the slides is below and my notes follow.
Datacenter Networks are in my Way
I did a talk earlier this week on the sea change currently taking place in datacenter networks. In Datacenter Networks are in my Way I start with an overview of where the costs are in a high scale datacenter. With that backdrop, we note that networks are fairly low power consumers relative to the total facility consumption and not even close to the dominant cost. Are they actually a problem? The rest of the talk is arguing networks are actually a huge problem across the board including cost and power. Overall, networking gear lags behind the rest of the high-scale infrastructure world, block many key innovations, and actually are both cost and power problems when we look deeper.
AWS Free Tier: 750 hours of EC2 for free
What happens when you really, really, focus on efficient infrastructure and driving down costs while delivering a highly available, high performance service? Well, it works. Costs really do fall and the savings can be passed on to customers. AWS prices have been falling for years but this is different. Its now possible to offer an small EC2 instance for free. You can now have 750 hours of EC2 usage per month without charge.
Amazon Web Services Book
Long time Amazon Web Services Alum Jeff Barr has written a book on AWS. Jeff's been with AWS since the very early days and he knows the services well. The new book Host Your Web Site in the Cloud: Amazon Web Services Made Easy, covers each of the major AWS services, how to write code against them, with code examples in PHP. It covers S3, EC2, SQS, EC2 Monitoring, Auto Scaling, Elastic Load Balancing, and SimpleDB. The table of contents: Recommended if you are interested in Cloud Computing and AWS: http://www.amazon.com/Host-Your-Web-Site-Cloud/dp/0980576830.
Netflix Migration to the Cloud
This morning I came across an article written by Sid Anand, an architect at Netflix that is super interesting. I liked it for two reasons: 1) it talks about the move of substantial portions of a high-scale web site to the cloud, some of how it was done, and why it was done, and 2) its gives best practices on AWS SimpleDB usage. I love articles about how high scale systems work.
Scaling AWS Relational Database Service
Hosting multiple MySQL engines with MySQL Replication between them is a common design pattern for scaling read-heavy MySQL workloads. As with all scaling techniques, there are workloads for which it works very well but there are also potential issues that need to be understood. In this case, all write traffic is directed to the primary server and, consequently is not scaled which is why this technique works best for workloads heavily skewed towards reads. But, for those fairly common read heavy workloads, the techniques works very well and allows scaling the read workload across over a fleet of MySQL instances. Of course, as with any asynchronous replication scheme, the read replicas are not transactionally updated.
Overall Data Center Costs
A couple of years ago, I did a detailed look at where the costs are in a modern , high-scale data center. The primary motivation behind bringing all the costs together was to understand where the problems are and find those easiest to address. Predictably, when I first brought these numbers together, a few data points just leapt off the page: 1) at scale, servers dominate overall costs, and 2) mechanical system cost and power consumption seems unreasonably high. Both of these areas have proven to be important technology areas to focus upon and there has been considerable industry-wide innovation particularly in cooling efficiency over the last couple of years.
DataCloud 2010: Workshop on Data Intensive Computing in the Clouds Call for Papers
For those of you writing about your work on high scale cloud computing (and for those interested in a great excuse to visit Anchorage Alaska), consider submitting a paper to the Workshop on Data Intensive Cloud Computing in the Clouds (DataCloud 2011). The call for papers is below. --jrh ------------------------------------------------------------------------------------------- *** Call for Papers *** WORKSHOP ON DATA INTENSIVE COMPUTING IN THE CLOUDS (DATACLOUD 2011) In conjunction with IPDPS 2011, May 16, Anchorage, Alaska http://www.cct.lsu.edu/~kosar/datacloud2011 ------------------------------------------------------------------------------------------- The First International Workshop on Data Intensive Computing in the Clouds (DataCloud2011) will be held in conjunction with the 25th IEEE International Parallel and Distributed Computing Symposium (IPDPS 2011), in Anchorage, Alaska.
Web-Scale Database
I'm dragging myself off the floor as I write this having watched this short video: MongoDB is Web Scale. It won't improve your datacenter PUE, your servers won't get more efficient, and it won't help you scale your databases but, still, you just have to check out that video. Thanks to Andrew Certain of Amazon for sending it my way. --jrh James Hamilton e: [email protected] w: http://www.mvdirona.com
Amazon EC2 for under $15/Month
You can now run an EC2 instance 24x7 for under $15/month with the announcement last night of the Micro instance type. The Micro includes 613 MB of RAM and can burst for short periods up to 2 EC2 Compute Units(One EC2 Compute Unit equals 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor). They are available in all EC2 supported regions and in 32 or 64 bit flavors. The design point for Micro instances is to offer a small amount of consistent, always available CPU resources while supporting short bursts above this level.
Gone Boating
I'm taking some time off and probably won't blog again until the first week of September. Jennifer and I are taking the boat north to Alaska. Most summers we spend a bit of time between the northern tip of Vancouver island and the Alaska border. This year is a little different for 2 reasons. First, we're heading further north than in the past and will spend some time in Glacier Bay National Park & Preserve. The second thing that makes this trip a bit different is, weather permitting, we'll be making the nearly thousand mile one way trip as an off shore crossing.
Energy Proportional Datacenter Networks
A couple of weeks back Greg Linden sent me an interesting paper called Energy Proportional Datacenter Networks. The principal of energy proportionality was first coined by Luiz Barroso and Urs Hölzle in an excellent paper titled The Case for Energy-Proportional Computing. The core principal behind energy proportionality is that computing equipment should consume power in proportion to their utilization level. For example, a computing component that consumes N watts at full load, should consume X/100*N Watts when running at X% load. This may seem like a obviously important concept but, when the idea was first proposed back in 2007, it was not uncommon for a server running at 0% load to be consuming 80% of full load power.
High Performance Computing Hits the Cloud
High Performance Computing (HPC) is defined by Wikipedia as: High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems. Today, computer systems approaching the teraflops-region are counted as HPC-computers. The term is most commonly associated with computing used for scientific research or computational science. A related term, high-performance technical computing (HPTC), generally refers to the engineering applications of cluster-based computing (such as computational fluid dynamics and the building and testing of virtual prototypes). Recently, HPC has come to be applied to business uses of cluster-based supercomputers, such as data warehouses, line-of-business (LOB) applications, and transaction processing.
Long tailed workloads and the return of hierarchical storage management
Hierarchical storage management (HSM) also called tiered storage management is back but in a different form. HSM exploits the access pattern skew across data sets by placing cold, seldom accessed data on slow cheap media and frequently accessed data on fast near media. In old days, HSM typically referred to system mixing robotically managed tape libraries with hard disk drive staging areas. HSM was actually never gone ¿ its just a very old technique to exploit data access pattern skew to reduce storage costs. Here's an old unit from FermiLab. < ?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /> Hot data or data currently being accessed is stored on disk and old data that has not been recently accessed is stored on tape.
Hadoop Summit 2010
I didn't attend the Hadoop Summit this year or last but was at the inaugural event back in 2008 and it was excellent. This year, the Hadoop Summit 2010 was held June 29 again in Santa Clara. This agenda for the 2010 event is at: Hadoop Summit 2010 Agenda. Since I wasn't able to be there, Adam Gray of the Amazon AWS team was kind enough to pass on his notes and let me use them here: Key Takeaways · Yahoo and Facebook operate the world largest Hadoop clusters, 4,000/2,300 nodes with 70/40 petabytes respectively. They run full cluster replicas to assure availability and data durability. · Yahoo released Hadoop security features with Kerberos integration which is most useful for long running multitenant Hadoop clusters. · Cloudera released paid enterprise version of Hadoop with cluster management tools and several dB connectors and announced ...
Velocity 2010
I did a talk at Velocity 2010 last week. The slides are posted at Datacenter Infrastructure Innovation and the video is available at Velocity 2010 Keynote. Urs Holze Google Senior VP of infrastructure also did a Velocity keynote. It was an excellent talk and is posted at Urs Holzle at Velocity 2010. Jonathan Heilliger, Facebook VP of Technical Operations spoke at Velocity as well. A talk summary is up at: Managing Epic Growth in Real Time. Tim O'Reilly did a talk: O'Reilly Radar. Velocity really is a great conference. Last week I posted two quick notes on Facebook: Facebook Software Use and 60,000 Servers at Facebook.
60,000 servers at Facebook
Last week, I estimated that Facebook now had 50,000 servers in Facebook Software Use. Rich Miller of Datacenter Knowledge actually managed to sleuth out the accurate server count in: Facebook Server Count: 60,000 or more. < ?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /> He took Tom Cook's Velocity 2010 talk from last week that showed growth without absolute numbers. But Rich noticed it did have dates and Facebook had previously released the server count of 30k servers at a known data. With the curve and the previous calibration point, we have the number: 60,000. Not really that large but the growth rate is amazing.
Amazon SimpleDB Developer Guide
The NoSQL movement continues to gain momentum. I don't see these systems as replacing relational systems for all applications but it is also crystal clear that relational systems are a poor choice for some workloads. See One Size Does Not Fit All for my take on the different types of systems that make up the structured storage market. The Amazon Web Services entrant in the NoSQL market segment is SimpleDB. I've posted on SimpleDB in the past starting back in 2007 Amazon SimpleDB Announced and more recently in I Love Eventual Consistency but¿ I recently came across a book by Prabhakar Chaganti and Rich Helms on SimpleDB.
Facebook Software Use
This morning I came across Exploring the software behind Facebook, the World's Largest Site. The article doesn't introduce new data not previously reported but it's a good summary of the software used by Facebook and the current scale of the social networking site: · 570 billion page views monthly · 3 billion photo uploads monthly · 1.2 million photos served per second · 30k servers The later metric, the 30k servers number is pretty old (Facebook has 30,000 servers). I would expect the number to be closer to 50k now based only upon external usage growth.
SeaMicro Releases Innovative Intel Atom Server
I've been talking about the application low-power, low-cost processors to server workloads for years starting with The Case for Low-Cost, Low-Power Servers. Subsequent articles get into more detail: Microslice Servers, Low-Power Amdahl Blades for Data Intensive Computing, and Successfully Challenging the Server Tax Single dimensional measures of servers like “performance” without regard to server cost or power dissipation are seriously flawed. The right way to measure server performance is work done per dollar and work done by joule. If you adopt these measures of workload performance, we find that cold storage workload and highly partitionable workloads run very well on low-cost, low-power servers.
The Drive-by Download Problem
A couple of days ago I came across an interesting article by Microsoft Fellow Mark Russinovich. In this article, Mark hunts a random Internet Explorer crash with his usual tools: The Case of the Random IE Crash. He chases down the IE issue to a Yahoo! Toolbar. This caught my interest for two reasons: 1) the debug technique used to chase it down was interesting, and 2) it's a two week old computer with no toolbars ever installed. From Mark's blog: This came as a surprise because the system on which the crash occurred was my home gaming system, a computer that I'd only had for a few weeks.
Cloud Data Freedom
One of the most important attributes needed in a cloud solution is what I call cloud data freedom. Having the ability to move data out of the cloud quickly, efficiently, cheaply, and without restriction is a mandatory prerequisite in my opinion to trusting a cloud. In fact, you need the ability to move the data both ways. Moving in cheaply, efficiently, and quickly is often required just to get the work done. And the ability to move out cheaply, efficiently, quickly, and without restriction is the only way to avoid lock-in. Data movement freedom is the most important attribute of an open cloud and a required prerequisite to avoiding provider lock in.
Economic Incentives Applied to Web Latency
Last month I wrote about Solving World Problems with Economic Incentives. In that post I talked about the power of economic incentives when compared to regulatory body intervention. I'm not really against laws and regulations ¿ the EPA, for example, has done some good work and much of what they do has improved the situation. But 9 times out of 10 good regulation is first blocked and/or water down by lobby groups, what finally gets enacted is often not fully through and brings unintended consequences, it is often overly prescriptive (see Right Problem but Wrong Approach), and regulations are enacted at the speed of government (think continental drift ¿ there is movement but it's often hard to detect).
The New World Order
Industry trends come and go. The ones that stay with us and have lasting impact are those that fundamentally change the cost equation. Public clouds clearly pass this test. The potential savings approach 10x and, in cost sensitive industries, those that move to the cloud fastest will have a substantial cost advantage over those that don't. And, as much as I like saving money, the much more important game changer is speed of execution. Those companies depending upon public clouds will noticeably more nimble. Project approval to delivery times fall dramatically when there is no capital expense to be approved.
Netflix on AWS
I did a talk at the Usenix Tech conference last year, Where does the Power Go in High Scale Data Centers. After the talk I got into a more detailed discussion with many folks from Netflix and Canada's Research in Motion, the maker of the Blackberry. The discussion ended up in a long lunch over a big table with folks from both teams. The common theme of the discussion was predictably, given the companies and folks involved, innovation in high scale service and how to deal with incredible growth rates. Both RIM and Netflix are very successful and, until you have experienced and attempted to manage internet growth rates, you really just don't know.
PUE is Still Broken and I still use it
PUE is still broken and I still use it. For more on why PUE has definite flaws, see: PUE and Total Power Usage Efficiency. However, I still use it because it's an easy to compute summary of data center efficiency. It can be gamed endlessly but it's easy to compute and it does provide some value. Improvements are underway in locking down of the most egregious abuses of PUE. Three were recently summarized in Technical Scribblings RE Harmonizing Global Metrics for Data Center Energy Efficiency. In this report from John Stanley, the following were presented: · Total energy to include all forms of energy whether electric or otherwise (e.g.
State of Public Sector Cloud Computing
Federal and state governments are prodigious information technology users. Federal Chief Information Security Office Vivek Kundra reports that the United States government is spending $76B annually on 10,000 different systems. In a recently released report, State of Public Sector Cloud Computing, Vivek Kundra summarizes the benefits of cloud computing: There was a time when every household, town, farm or village had its own water well. Today, shared public utilities give us access to clean water by simply turning on the tap; cloud computing works in a similar fashion. Just like the water from the tap in your kitchen, cloud computing services can be turned on or off quickly as needed. Like at the water company, there is a team of dedicated professionals making sure the service provided is safe and available on a 24/7 basis. Best of all, when ...
Solving World Problems With Economic Incentives
Economic forces are more powerful than politics. Political change is slow. Changing laws takes time. Lobbyist water down the intended legislation. Companies find loop holes. The population as a whole, lacks the strength of conviction to make the tough decisions and stick with them. < ?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /> Economic forces are far more powerful and certainly more responsive than political forces. Effectively, what I'm observing is great good can be done if there is a business model and profit encouraging it. Here's my favorite two examples, partly because they are both doing great things and partly because they are so different in their approach, but still have the common thread of using the free market to improve the world.
When Very Low-Power, Low-Cost Servers Don't Make Sense
I am excited by very low power, very low cost servers and the impact they will have on our industry. There are many workloads where CPU is in excess and lower power and lower cost servers are the right answer. These are workloads that don't fully exploit the capabilities of the underlying server. For these workloads, the server is out of balance with excess CPU capability (both power and cost). There are workloads were less is more. But, with technology shifts, it's easy to get excited and try to apply the new solution too broadly. We can see parallels in the Flash memory world.
Computer Room Evaporative Cooling
I recently came across a nice data center cooling design by Alan Beresford of EcoCooling Ltd. In this approach, EcoCooling replaces the CRAC units with a combined air mover, damper assembly, and evaporative cooler. I've been interested by evaporative coolers and their application to data center cooling for years and they are becoming more common in modern data center deployments (e.g. Data Center Efficiency Summit). < ?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /> An evaporative cooler is a simple device that cools air through taking water through a state change from fluid to vapor. They are incredibly cheap to run and particularly efficient in locals with lower humidity.
Inter-Datacenter Replication & Geo-Redundancy
Wide area network costs and bandwidth shortage are the most common reasons why many enterprise applications run in a single data center. Single data center failure modes are common. There are many external threats to single data center deployments including utility power loss, tornado strikes, facility fire, network connectivity loss, earthquake, break in, and many others I've not yet been “lucky” enough to have seen. And, inside a single facility, there are simply too many ways to shoot one's own foot. All it takes is one well intentioned networking engineer to black hole the entire facilities networking traffic. Even very high quality power distribution systems can have redundant paths taken out by fires in central switch gear or cascading failure modes. And, even with very highly redundant systems, if the redundant paths aren't tested often, they won't work. Even with incredibly redundancy, just having the redundant components in the same ...
Clustrix Database Appliance
Patterson on Cloud Computing
Dave Patterson did a keynote at Cloud Futures 2010. I wasn't able to attend but I've heard it was a great talk so I asked Dave to send the slides my way. He presented Cloud Computing and the Reliable Adaptive Systems Lab. The Berkeley RAD Lab principal investigators include: Armando Fox, Randy Katz & Dave Patterson (systems/networks), Michael Jordan (machine learning), Ion Stoica (networks & P2P), Anthony Joseph (systems/security), Michael Franklin (databases), and Scott Shenker (networks) in addition to 30 Phd students, 10 undergrads, and 2 postdocs. The talk starts by arguing that cloud computing actually is a new approach drawing material from the Above the Clouds paper that I mentioned early last year: Berkeley Above the Clouds.
Yahoo! Computing Coop
Rich Miller of Datacenter Knowledge covered this last week and it caught my interest. I'm super interested in modular data centers (Architecture for Modular Datacenters) and highly efficient infrastructure (Data Center Efficiency Best Practices) so the Yahoo! Computing Coop caught my interest. As much as I like the cost, strength, and availability of ISO standard shipping containers, 8' is an inconvenient width. It's not quite wide enough for two rows of standard racks and there are cost and design advantages in having at least two rows in a container. With two rows, air can be pulled in each side with a single hot aisle in the middle with large central exhaust fans.
Facebook Flashcache
Facebook released Flashcache yesterday: Releasing Flashcache. The authors of Flashcache, Paul Saab and Mohan Srinivasan, describe it as “a simple write back persistent block cache designed to accelerate reads and writes from slower rotational media by caching data in SSD's.” There are commercial variants of flash-based write caches available as well. For example, LSI has a caching controller that operates at the logical volume layer. See LSI and Seagate take on Fusion-IO with Flash. The way these systems work is, for a given logical volume, page access rates are tracked. Hot pages are stored on SSD while cold pages reside back on spinning media.
One Browser To Rule Them All
There have been times in past years when it really looked like we our industry was on track to supporting only a single relevant web browser. Clearly that's not the case today. In a discussion with a co-working today on the importance of “other” browsers, I wanted to put some data on the table so I looked up the browser stats for this web site (http://mvdirona.com). I hadn't looked for a while and found the distribution truly interesting: Admittedly, those that visit this site clearly don't represent the broader population well. Nonetheless, the numbers are super interesting. Firefox eclipsing Internet Explorer and by such a wide margin was surprising to me.
VPN over WiMAX
We live on a boat which has lots of upside but broadband connectivity isn't one of them. As it turns out, our marina has WiFi but it is sufficiently unreliable that we needed another solution. I wish there was a Starbucks hotspot across the street ¿ actually there is one within a block but we can't quite pick up the signal even with an external antennae (Syrens). WiFi would have been a nice solution but didn't work so we decided to go with WiMAX. We have used ClearWire for over a year on the boat and, generally, it has worked acceptably well.
Right Problem but Wrong Approach
Standards and benchmarks have driven considerable innovation. The most effective metrics are performance-based. Rather than state how to solve the problem, they say what needs to be achieved and leave the innovation open. I'm an ex-auto mechanic and was working as a wrench in a Chevrolet dealership in the early 80. I hated the emission controls that were coming into force at that time because they caused the cars to run so badly. A 1980 Chevrolet 305 CID with 4 BBL carburetor would barely idle in perfect tune. It was a mess.
High Scale Network Research
High scale network research is hard. Running a workload over a couple of hundred servers says little of how it will run over thousands or tens of thousands servers. But, having 10's of thousands of notes dedicated to a test cluster is unaffordable. For systems research the answer is easy: use Amazon EC2. It's an ideal cloud computing application. Huge scale is needed during some parts of the research project but the servers aren't needed 24 hours/day and certainly won't be needed for the three year amortization life of the servers. However, for high-scale network research, finding the solution is considerably more difficult.
Stonebraker on CAP Theorem and Databases
Mike Stonebraker published an excellent blog posting yesterday at the CACM site: Errors in Database Systems, Eventual Consistency, and the CAP Theorem. In this article, Mike challenges the application of Eric Brewer's CAP Theorem by the NoSQL database community. Many of the high-scale NoSQL system implementers have argued that the CAP theorem forces them to go with an eventual consistent model. Mike challenges this assertion pointing that some common database errors are not avoided by eventual consistency and CAP really doesn't apply in these cases. If you have an application error, administrative error, or database implementation bug that losses data, then it is simply gone unless you have an offline copy.
Using a Market Economy
Every so often, I come across a paper that just nails it and this one is pretty good.. Using a market Economy to Provision Resources across a Planet-wide Clusters doesn't fully investigate the space but it's great to progress on this important area and the paper is a strong step in the right direction. I spend much of my time working on driving down infrastructure costs. There is lots of great work that can be done in datacenter infrastructure, networking, and server design. It's both a fun and important area. But, an even bigger issue is utilization.
I love eventual consistency but...
I love eventual consistency but there are some applications that are much easier to implement with strong consistency. Many like eventual consistency because it allows us to scale-out nearly without bound but it does come with a cost in programming model complexity. For example, assume your program needs to assign work order numbers uniquely and without gaps in the sequence. Eventual consistency makes this type of application difficult to write. Applications built upon eventually consistent stores have to be prepared to deal with update anomalies like lost updates. For example, assume there is an update at time T1 where a given attribute is set to 2.
Scaling at MySpace
MySpace makes the odd technology choice that I don't fully understand. And, from a distance, there are times when I think I see opportunity to drop costs substantially. But, let's ignore that, and tip our hat to the MySpace for incredibly scale they are driving. It's a great social networking site and you just can't argue with the scale they are driving. Their traffic is monstrous and, consequently, it's a very interesting site to understand in more detail. Lubor Kollar of SQL Server just sent me this super interesting overview of the MySpace service.
Scaling FarmVille
Last week, I posted Scaling Second Life. Royans sent me a great set of scaling stories: Scaling Web Architectures and Vijay Rao of AMD pointed out How FarmVille Scales to Harvest 75 Million Players a Month. I find the Farmville example particularly interesting in that it's “only” a casual game. Having spent most of my life (under a rock) working on high-scale servers and services, I naively would never have guessed that casual gaming was big business. But it is. Really big business. To put a scale point on what "big" means in this context, Zynga, the company responsible for Farmville, is estimated to have a valuation of between $1.5B and $3B (Zynga Raising $180M on Astounding Valuation) with annual revenues of roughly $250M (Zynga Revenues Closer to $250).
Scaling Second Life
As many of you know I collect high-scale scaling war stories. I've appended many of them below. Last week Ars Technica published a detailed article on Scaling Second Life: What Second Life can Teach your Datacenter About Scaling Web Apps. This article by Ian Wilkes who worked at Second Life from 2001 to 2009 where he was director of operations. My rough notes follow: · Understand scale required: o Billing system serving US and EU where each user interacts annually and the system has 10% penetration: 2 to 3 events/second o Chat system serving UE and EU where each user sends 10 message/day during workday: 20k messages/second · Does the system have to be available 24x7 and understand the impact of downtime (beware of over-investing in less important dimensions at the expense of those more important) · Understand ...
Private Clouds Are Not The Future
Cloud computing is an opportunity to substantially improve the economics of enterprise IT. We really can do more with less. I firmly believe that enterprise IT is a competitive weapon and, in all industries, the leaders are going to be those that invest deeply in information processing. The best companies in each market segment are going to be information processing experts and because of this investment, are going to know their customer better, will chose their suppliers better, will have deep knowledge and control of their supply chains, and will have an incredibly efficient distribution system.
Very Lower Power Servers Progress
There is a growing gap between memory bandwidth and CPU power and this growing gap makes low power servers both more practical and more efficient than current designs. Per-socket processor performance continues to increase much more rapidly than memory bandwidth and this trend applies across the application spectrum from mobile devices, through client, to servers. Essentially we are getting more compute than we have memory bandwidth to feed. We can attempt to address this problem two ways: 1) more memory bandwidth and 2) less fast processors. The former solution will be used and Intel Nehalem is a good example of this but costs increase non-linearly so the effectiveness of this technique will be bounded.
Very Low Power Servers Progress
There is a growing gap between memory bandwidth and CPU power and this growing gap makes low power servers both more practical and more efficient than current designs. Per-socket processor performance continues to increase much more rapidly than memory bandwidth and this trend applies across the application spectrum from mobile devices, through client, to servers. Essentially we are getting more compute than we have memory bandwidth to feed. We can attempt to address this problem two ways: 1) more memory bandwidth and 2) less fast processors. The former solution will be used and Intel Nehalem is a good example of this but costs increase non-linearly so the effectiveness of this technique will be bounded.
MapReduce in CACM
In this month's Communications of the Association of Computing Machinery, a rematch of the MapReduce debate was staged. In the original debate, Dave Dewitt and Michael Stonebraker, both giants of the database community, complained that: 1. MapReduce is a step backwards in database access 2. MapReduce is a poor implementation 3. MapReduce is not novel
Sat, 19 Dec 2009 18:00:08 UTC
The networking world remains one of the last bastions of the mainframe computing design point. Back in 1987 Garth Gibson, Dave Patterson, and Randy Katz showed we could aggregate low-cost, low-quality commodity disks into storage subsystems far more reliable and much less expensive than the best purpose-built storage subsystems (Redundant Array of Inexpensive Disks). The lesson played out yet again where we learned that large aggregations of low-cost, low-quality commodity servers are far more reliable and less expensive than the best purpose-built scale up servers. However, this logic has not yet played out in the networking world. The networking equipment world looks just like mainframe computing ecosystem did 40 years ago.
Networking: The Last Bastion of Mainframe Computing
ACM Science Cloud 2010 Call For Papers
I'm on the technical program committe for ACM Science Cloud 2010. You should consider both submitting a paper and attending the conference. The conference will be held in Chicago on June21st, 2010 colocated with ACM HPDC 2010 (High Performance Distributed Computing). The call for papers abstracst are due Feb 22 with final papers due March 1st: http://dsl.cs.uchicago.edu/ScienceCloud2010/ Workshop Overview: The advent of computation can be compared, in terms of the breadth and depth of its impact on research and scholarship, to the invention of writing and the development of modern mathematics. Scientific Computing has already begun to change how science is done, enabling scientific breakthroughs through new kinds of experiments that would have been impossible only a decade ago.
Big Week at Amazon Web Services
There were three big announcements this week at Amazon Web Services. All three announcements are important but the first is the one I'm most excited about in that it is a fundamental innovation in how computation is sold. The original EC2 pricing model was on-demand pricing. This is the now familiar pay-as-you-go and pay-as-you-grow pricing model that has driven much of the success of EC2. Subsequently reserved instances were introduced. In the reserved instance pricing model, customers have the option of paying an up-front charge to reserve a server. There is still no obligation to use that instance but it is guaranteed to be available if needed by the customer.
AWS Wants You!
Want to join a startup team within Amazon Web Services? I'm deeply involved and excited about this project and another couple of talented engineers could really make a difference. We are looking for: User Interface Software Development Engineer We are looking for an experienced engineer with a proven track record of building high quality, AJAX enabled websites. HTML, JavaScript, AJAX, and CSS experience is critical, along with Java and Tomcat. Experience with languages such as PHP, Perl, Ruby, Python, etc. is also useful. You must have significant experience in designing highly reliable and scalable distributed systems, including building front end website facing applications.
Data Center Waste Heat Reclaimation
For several years I've been interested in PUE<1.0 as a rallying cry for the industry around increased efficiency. From PUE and Total Power Usage Efficiency (tPUE) where I talked about PUE<1.0: In the Green Grid document [Green Grid Data Center Power Efficiency Metrics: PUE and DCiE], it says that “the PUE can range from 1.0 to infinity” and goes on to say “¿ a PUE value approaching 1.0 would indicate 100% efficiency (i.e. all power used by IT equipment only). In practice, this is approximately true. But PUEs better than 1.0 is absolutely possible and even a good idea. Let's use an example to better understand this. I'll use a 1.2 PUE facility in this case.
2010 the Year of MicroSlice Servers
Very low-power scale-out servers -- it's an idea whose time has come. A few weeks ago Intel announced it was doing Microslice servers: Intel Seeks new `microserver' standard. Rackable Systems (I may never manage to start calling them `SGI' ¿ remember the old MIPS-based workstation company?) was on this idea even earlier: Microslice Servers. The Dell Data Center Solutions team has been on a similar path: Server under 30W. Rackable has been talking about very low servers as physicalization: When less is more: the basics of physicalization. Essentially they are arguing that rather than buying, more-expensive scale-up servers and then virtualizing the workload onto those fewer server, buy lots of smaller servers.
Is Sandia National Lab's Red Sky Really Able to Deliver a PUE of 1.035?
Sometime back I whined that Power Usage Efficiency (PUE) is a seriously abused term: PUE and Total Power Usage Efficiency. But I continue to use it because it gives us a rough way to compare the efficiency of different data centers. It's a simple metric that takes the total power delivered to a facility (total power) and divides it by the amount of power delivered to the servers (critical power or IT load). A PUE of 1.35 is very good today. Some datacenter owners have claimed to be as good as 1.2. Conventionally designed data centers operated conservatively are in the 1.6 to 1.7 range. Unfortunately most of the industry has a PUE of over 2.0, some are as bad as 3.0, and the EPA reports the industry average is 2.0 (Report to Congress on Server Data Center Efficiency).
ACM Symposium on Cloud Computing
I'm on the program committee for the ACM Symposium on Cloud Computing. The conference will be held June 10th and 11th 2010 in Indianapolis Indiana. SOCC brings together database and operating systems researchers and practitioners interested in cloud computing. It is jointly sponsored by the ACM Special Interest Group on Management of Data (SIGMOD) and the ACM Special Interest Group on Operating Systems (SIGOPS). The conference will be held in conjunction with ACM SIGMOD in 2010 and with SOSP in 2011 continuing to alternate between SIGMOD and SOSP in subsequent years. Joe Hellerstein is the SOCC General Chair and Surajit Chaudhuri and Mendel Rosenblum are Program Chairs.
Randy Shroup & John Ousterhout at HPTS 2009
HPTS has always been one of my favorite workshops over the years. Margo Seltzer was the program chair this year and she and the program committee brought together one of the best programs ever. Earlier I posted my notes from Andy Bectolsheim's session Andy Bechtolsheim at HPTS 2009 and his slides Technologies for Data Intensive Computing. Two other sessions were particularly interesting and worth summarizing here. The first is a great talk on high-scale services lessons learned from Randy Shroup and a talk by John Ousterhout on RAMCloud a research project to completely eliminate the storage hierarchy and store everything in DRAM.
Randy Shoup & John Ousterhout at HPTS 2009
HPTS has always been one of my favorite workshops over the years. Margo Seltzer was the program chair this year and she and the program committee brought together one of the best programs ever. Earlier I posted my notes from Andy Bectolsheim's session Andy Bechtolsheim at HPTS 2009 and his slides Technologies for Data Intensive Computing. Two other sessions were particularly interesting and worth summarizing here. The first is a great talk on high-scale services lessons learned from Randy Shoup and a talk by John Ousterhout on RAMCloud a research project to completely eliminate the storage hierarchy and store everything in DRAM.
Technologies for Data Intensive Computing
In an earlier post Andy Bechtolsheim at HPTS 2009 I put my notes up on Andy Bechtolsheim's excellent talk at HPTS 2009. His slides from that talk are now available: Technologies for Data Intensive Computing. Strongly recommended. James Hamilton e: [email protected] w: http://www.mvdirona.com b: http://blog.mvdirona.com / http://perspectives.mvdirona.com
Sat, 07 Nov 2009 18:45:57 UTC
Just about exactly one year ago, I posted a summary and the slides from an excellent Butler Lampson talk: The Uses of Computers: What's Past is Merely Prologue. Its time for another installment. Butler was at SOSP 2009 a few weeks back and Marvin Theimer caught up with him for a wide ranging discussion on distributed systems. With Butler's permission, what follows are Marvin's notes from the discussion. Contrast cars with airplanes: when the fancy electronic systems fail you (most-of-the-time) can pull a car over to the side of the road and safely get out whereas an airplane will crash-and-burn.
Conversation with Butler Lampson at SOSP 2009
One Size Does Not Fit All
Last week AWS announced the Amazon Relational Database Service (Amazon RDS) and I blogged that it was big step forward for the cloud storage world: Amazon RDS, More Memory, and Lower Prices. This really is an important step forward in that a huge percentage of commercial applications are written to depend upon Relational Databases. But, I was a bit surprised to get a couple of notes asking about the status of Simple DB and whether the new service was a replacement. These questions were perhaps best characterized by the forum thread The End is Nigh for SimpleDB.
The Cost of Latency
Recently I came across Steve Souders Velocity 2009 presentation: High Performance Web Sites: 14 Rules for Faster Loading Pages. Steve is an excellent speaker and the author of two important web performance books: · High Performance Web Sites · Even Faster Web Sites The reason this presentation caught my interest is it focused on 1) why web sites are slow, 2) what to do about it, and 3) the economics of why you should care. Looking first at the economic argument for faster web sites, many companies are obsessed with site performance but few publish data the economic impact of decreasing web site latency.
Relational Database Service, More Memory, and Lower Prices
I've worked on our around relational database systems for more than 20 years. And, although I freely admit that perfectly good applications can, and often are, written without using a relational database system, it's simply amazing how many of world's commercial applications depend upon them. Relational database offerings continue to be the dominant storage choice for applications with a need for structured storage. There are many alternatives, some of which are very good. ISAMs like Berkeley DB. Simple key value stores. Distributed Hash Tables. There are many excellent alternatives and, for many workloads, they are very good choices. There is even a movement called Nosql aimed at advancing non-relational choices.
Andy Bechtolsheim at HPTS 2009
I've attached below my rough notes from Andy Bechtolsheim's talk this morning at High Performance Transactions Systems 2009. The agenda for HPTS 2009 is up at: http://www.hpts.ws/agenda.html. Andy is a founder of Sun Microsystems and of Arista Networks and is an incredibly gifted hardware designer. He's consistently able to innovate and push the design limits while delivering reliable systems that really do work. It was an excellent talk. Unfortunately, he's not planning to post the slides so you'll have to make do with my rough notes below.
Stanford Clean Slate CTO Summit
I attended the Stanford Clean Slate CTO Summit last week. It was a great event organized by Guru Parulkar. Here's the agenda: 12:00: State of Clean Slate -- Nick McKeown, Stanford 12:30:00pm: Software defined data center networking -- Martin Casado, Nicira 1:00: Role of OpenFlow in data center networking -- Stephen Stuart, Google 2:30: Data center networks are in my way -- James Hamilton, Amazon 3:00: Virtualization and Data Center Networking -- Simon Crosby, Citrix 3:30:RAMCloud: Scalable Datacenter Storage Entirely in DRAM -- John Ousterhout, Stanford 4:00: L2.5: Scalable and reliable packet delivery in data centers -- Balaji Prabhakar, Stanford 4:45: Panel: Challenges of Future Data Center Networking--Panelists, James Hamilton, Stephen Stuart, Andrew Lambeth (VMWare), Marc Kwiatkowski (Facebook) I ...
Jeff Dean: Design Lessons and Advice from Building Large Scale Distributed Systems
Jeff Dean of Google did an excellent keynote talk at LADIS 2009. Jeff's talk is up at: http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf and my notes follow: · A data center wide storage hierarchy: o Server: § DRAM: 16GB, 100ns, 20GB/s § Disk: 2TB, 10ms, 200MB/s o Rack: § DRAM: 1TB, 300us, 100MB/s § Disk: 160TB, 11ms, 100MB/s
Replacing ALL disk with SSD?
I love Solid State Disks and have written about them extensively: SSD versus Enterprise SATA and SAS disks Where SSDs Don't Make Sense in Server Applications Intel's Solid State Drives When SSDs Make Sense in Client Applications When SSDs Make Sense in Server Applications 1,000,000 IOPS Laptop SSD Performance Degradation Problems Flash SSD in 38% of Laptops by 2011 100,000 IOPS Flash Memory SSD Performance
You really DO need ECC Memory
In past posts such as Web Search Using Small Cores I've said “Atom is a wonderful processor but current memory managers on Atom boards don't support Error Correcting Codes (ECC) nor greater than 4 gigabytes of memory. I would love to use Atom in server designs but all the data I've gathered argues strongly that no server workload should be run without ECC.” And, in Linux/Apache on ARM Processors I said “unlike Intel Atom based servers, this ARM-based solution has the full ECC Memory support we want in server applications (actually you really want ECC in all applications from embedded through client to servers” An excellent paper was just released that puts hard data behind this point and shows conclusively that ECC is absolutely needed.
VL2: A Scalable and Flexible Data Center Network
Data center networks are nowhere close to the biggest cost or the even the most significant power consumers in a data center (Cost of Power in Large Scale Data Centers) and yet substantial networking constraints loom large just below the surface. There are many reasons why we need innovation in data center networks but let's look at a couple I find particularly interesting and look at the solution we offered in a recent SIGCOMM paper VL2: A Scalable and Flexible Data Center Network. Server Utilization By far the biggest infrastructure cost in a high-scale service is the servers themselves.
Microsoft Chicago is Live
Microsoft's Chicago data center was just reported to be online as of July 20th. Data Center Knowledge published an interesting and fairly detailed report in: Microsoft Unveils Its Container-Powered Cloud. Early industry rumors were that Rackable Systems (now SGI but mark me down as confused on how that brand change is ever going to help the company) had won the container contract for the lower floor of Chicago. It appears that the Dell Data Center Solutions team has now has the business and 10 of the containers are from DCS. The facility is reported to be a ½ billion dollar facility of 700,000 square feet.
Web Search Using Small Cores
I recently came across an interesting paper that is currently under review for ASPLOS. I liked it for two unrelated reasons: 1) the paper covers the Microsoft Bing Search engine architecture in more detail than I've seen previously released, and 2) it covers the problems with scaling workloads down to low-powered commodity cores clearly. I particularly like the combination of using important, real production workloads rather than workload models or simulations and using that base to investigate an important problem: when can we scale workloads down to low power processors and what are the limiting factors?
Chillerless Data Center at 95F
This is 100% the right answer: Microsoft's Chiller-less Data Center. The Microsoft Dublin data center has three design features I love: 1) they are running evaporative cooling, 2) they are using free-air cooling (air-side economization), and 3) they run up to 95F and avoid the use of chillers entirely. All three of these techniques were covered in the best practices talk I gave at the Google Data Center Efficiency Conference (presentation, video). Other blog entries on high temperature data center operation: · Next Point of Server Differentiation: Efficiency at Very High Temperature · Costs of Higher Temperature Data Centers?
Heres Another Innovative Application Of
Here's another innovative application of commodity hardware and innovative software to the high-scale storage problem. MaxiScale focuses on 1) scalable storage, 2) distributed namespace, and 3) commodity hardware. Today's announcement: http://www.maxiscale.com/news/newsrelease/092109. They sell software designed to run on commodity servers with direct attached storage. They run N-way redundancy with a default of 3-way across storage servers to be able to survive disk and server failure. The storage can be accessed via HTTP or via Linux or Windows (2003 and XP) file system calls. The later approach requires a kernel installed device driver and uses a proprietary protocol to communicate back with the filer cluster but has the advantage of directly support local O/S read/write operations.
Heres Another Innovative Application of Commodity Hardware
Here's another innovative application of commodity hardware and innovative software to the high-scale storage problem. MaxiScale focuses on 1) scalable storage, 2) distributed namespace, and 3) commodity hardware. Today's announcement: http://www.maxiscale.com/news/newsrelease/092109. They sell software designed to run on commodity servers with direct attached storage. They run N-way redundancy with a default of 3-way across storage servers to be able to survive disk and server failure. The storage can be accessed via HTTP or via Linux or Windows (2003 and XP) file system calls. The later approach requires a kernel installed device driver and uses a proprietary protocol to communicate back with the filer cluster but has the advantage of directly support local O/S read/write operations.
ARM Cortex-A9 SMP Design Announced
ARM just announced a couple of 2-core SMP design based upon the Cortex-A9 application processor, one optimized for performance and the other for power consumption (http://www.arm.com/news/25922.html). Although the optimization points are different, both are incredibly low power consumers by server standards with the performance-optimized part dissipating only 1.9W at 2Ghz based upon the TSMC 40G process (40nm). This design is aimed at server applications and should be able to run many server workloads comfortably. In Linux/Apache on ARM Processors I described an 8 server cluster of web servers running the Marvell MV78100.
Code Splitting for Fast Javascript Application Startup
AJAX applications are wonderful because they allow richer web applications with much of the data being brought down asynchronously. The rich and responsive user interfaces of applications like Google Maps and Google Docs are excellent but JavaScript developers need to walk a fine line. The more code they download, the richer the UI they can support and the less synchronous server interactions they need. But, the more code they download, the slower the application can be to start. This is particularly noticeable when the client cache is cold and in mobile applications with restricted bandwidth back to the server.
Linux/Apache on ARM Processors
In The Case for Low-Cost, Low-Power Servers, I made the argument that the right measures of server efficiency was work done per dollar and work done per joule. Purchasing servers on single dimensional metrics like performance or power or even cost alone, makes no sense at all. Single dimensional purchasing leads to micro-optimizations that push one dimension to the detriment of others. Blade servers have been one of my favorite examples of optimizing the wrong metric (Why Blade Servers aren't the Answer to All Questions). Blades often trade increased cost to achieve server density.
Successfully Challenging the Server Tax
The server tax is what I call the mark-up applied to servers, enterprise storage, and high scale networking gear. Client equipment is sold in much higher volumes with more competition and, as a consequence, is priced far more competitively. Server gear, even when using many of the same components as client systems, comes at a significantly higher price. Volumes are lower, competition is less, and there are often many lock-in features that help maintain the server tax. For example, server memory subsystems support Error Correcting Code (ECC) whereas most client systems do not. Ironically both are subject to many of the same memory faults and the cost of data corruption in a client before the data is sent to a server isn't obviously less than the cost of that same data element being corrupted on the server.
Changes in the Cloud Computing World
We're back from China last Saturday night and, predictably, I'm swamped catching up on three weeks worth of queued work. The trip was wonderful (China Trip) but it's actually good to be back at work. Things are changing incredibly quickly industry-wide and it's a fun time to be part of AWS. An AWS feature I've been looking particularly looking forward to seeing announced is Virtual Private Cloud (VPC). It went into private beta two nights back. VPC allows customers to extend their private networks to the cloud through a virtual private network (VPN) to access their Amazon Web Service Elastic Compute Cloud (EC2) instances with the security they are used to having on their corporate networks.
Brief Blogging Hiatus
I'll be taking a brief hiatus from blogging during the first three weeks of August. Tomorrow we leave for China. You might wonder why we would go to China during the hottest time of the year. For example, our first stop, Xiamen, is expected to hit 95F today, which is fairly typical weather for this time of year (actually its comparable to the unusual weather we've been having in Seattle over the last week). The timing of the trip is driven by a boat we're buying nearing completion in a Xiamen China boat yard: Boat Progress.
Search Gets Interesting
Search is a market driven by massive networking effects and economies of scale. The big get better, the big get cheaper, and the big just keep getting bigger. Google has 65% of the Search market and continues to grow. In a deal announced yesterday, Microsoft will supply search to Yahoo and now has a combined share of 28%. For the first time ever, Microsoft has enough market share to justify continuing large investments. And, more importantly, they now have enough market to get good data on usage to tune the ranking engine to drive better quality search. And, although Microsoft and Yahoo!
HadoopDB: MapReduce over Relational Data
MapReduce has created some excitement in the relational database community. Dave Dewitt and Michael Stonebraker's MapReduce: A Major Step Backwards is perhaps the best example. In that posting they argued that map reduce is a poor structured storage technology, the execution engine doesn't include many of the advances found in modern, parallel RDBMS execution engines, it's not novel, and its missing features. In Mapreduce: A Minor Step Forward I argued that MapReduce is an execution model rather than storage engine. It is true that it is typically run over a file system like GFS or HDFS or simple structured storage system like BigTable or Hbase.
SIGMETRICS/Performance 2009 & USENIX 2009 Keynotes
I presented Where does the Power Go in High Scale Data Centers the opening keynote at SIGMETRICS/Performance 2009 last month. The video of the talk was just posted: SIGMETRICS 2009 Keynote. The talk starts after the conference kick-off at 12:20. The video appears to be incompatible with at least some versions of Firefox. I was only able to stay for the morning of the conference but I met lots of interesting people and got to catch up with some old friends. Thanks to Albert Greenberg and John Douceur for inviting me. I also did the keynote talk at this year's USENIX Technical Conference 2009 in San Diego.
Why I Enjoy Reading about Engineering Accidents, Failures, & Disasters
I'm a boater and I view reading about boating accidents as important. The best source that I've come across is the UKs Marine Accident Investigation Branch (MAIB). I'm an engineer and again, I view it as important to read about engineering failures and disasters. One of the best sources I know of is Peter G. Neumann's RISKS Digest. There is no question that firsthand experience is a powerful teacher but few of us have time (or enough lives) to make every possible mistake. There are just too many ways to screw-up. Clearly, it's worth learning from others when trying to make our own systems more safe or more reliable.
Pictures from the Fisher Plaza Data Center Fire
There have been many reports of the Fisher Plaza data center fire. An early one was the Data Center Knowledge article: Major Outage at Seattle Data Center. Data center fires aren't as rare as any of us would like but this one is a bit unusual in that fires normally happen in the electrical equipment or switchgear whereas this one appears to have been a bus duct fire. The bus duct fire triggered the sprinkler system. Several sprinkler heads were triggered and considerable water was sprayed making it more difficult to get the facility back online quickly. < ?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /> Several good pictures showing the fire damage were recently published in Tech Flash Photos: Inside the Fisher Fire.
Barbara Liskov 2008 Turing Award Winner
MIT's Barbara Liskov was awarded the 2008 Association of Computing Machinery Turing Award. The Turning award is the highest distinction in computer science and is often referred to as the Nobel price of computing. Past award winners are listed at: http://en.wikipedia.org/wiki/Turing_Award. The full award citation: Barbara Liskov has led important developments in computing by creating and implementing programming languages, operating systems, and innovative systems designs that have advanced the state of the art of data abstraction, modularity, fault tolerance, persistence, and distributed computing systems. The Venus operating system was an early example of principled operating system design.
Services Change Everything
Our industry has always moved quickly but the internet and high-scale services have substantially quickened the pace. Search is an amazingly powerful productivity tool and available effectively to free to all. The internet makes nearly all information available to anyone who can obtain time on an internet connection. Social networks and interest-area specific discussion groups are bringing together individuals of like interest from all over the globe. The cost of computing is falling rapidly and new services are released daily. The startup community has stayed viable through one of the most severe economic downturns since the great depression. Infrastructure as a service offerings allow new businesses to be build with very little seed investment.
Microsoft Bringing 35 Megawatts on-line
Microsoft announced yesterday that it was planning to bring both Chicago and Dublin online next month. Chicago is initially to be a 30MW critical load facility with a plan to build out to a booming 60MW. 2/3 of the facility is a high scale containerized facility. It's great to see the world's second modular data center going online (See http://perspectives.mvdirona.com/2009/04/01/RoughNotesDataCenterEfficiencySummitPosting3.aspx for details on an earlier Google facility). The containers in Chicago will hold 1,800 to 2,500 servers each. Assuming 200W/server, that's 1/2 MW for each container with 80 containers on the first floor and a 40MW container critical load.
ISCA 2009 Keynote II: Internet-Scale Service Infrastructure Efficiency
I presented the keynote at the International Symposium on Computer Architecture 2009 yesterday. Kathy Yelick kicked off the conference with the other keynote on Monday: How to Waste a Parallel Computer. Thanks to ISCA Program Chair Luiz Borroso for the invitation and for organizing an amazingly successful conference. I'm just sorry I had to leave a day early to attend a customer event this morning. My slides: Internet-Scale Service Infrastructure Efficiency. Abstract: High-scale cloud services provide economies of scale of five to ten over small-scale deployments, and are becoming a large part of both enterprise information processing and consumer services.
ISCA 2009 Keynote I: How to Waste a Parallel Computer -- Kathy Yelick
Title: Ten Ways to Waste a Parallel Computer Speaker: Katherine Yelick An excellent keynote talk at ISCA 2009 in Austin this morning. My rough notes follow: · Moore's law continues o Frequency growth replaced by core count growth · HPC has been working on this for more than a decade but HPC concerned as well · New World Order o Performance through parallelism o Power is overriding h/w concern o
Mon, 15 Jun 2009 00:53:34 UTC
I like Power Usage Effectiveness as a course measure of infrastructure efficiency. Its gives us a way of speaking about the efficiency of the data center power distribution and mechanical equipment without having to qualify the discussion on the basis of server and storage used or utilization levels, or other issues not directly related to data center design. But, there are clear problems with the PUE metric. Any single metric that attempts reduce a complex system to a single number is going to both fail to model important details and it is going to be easy to game.
PUE and Total Power Usage Efficiency (tPUE)
Erasure Coding and Cold Storage
Erasure coding provides redundancy for greater than single disk failure without 3x or higher redundancy. I still like full mirroring for hot data but the vast majority of the worlds data is cold and much of it never gets referenced after writing it: Measurement and Analysis of Large-Scale Network File System Workloads. For less-than-hot workloads, erasure coding is an excellent solution. Companies such as EMC, Data Domain, Maidsafe, Allmydata, Cleversafe, and Panasas are all building products based upon erasure coding. At FAST 2009 in late February, A Performance Evaluation and Examination of Open-Source Erasure Coding Libraries For Storage will be presented.
The SmugMug Tale
Don MacAskill did one of his usual excellent talks at MySQL Conf 09 this. My rough notes follow. Speaker: Don MacAskill Video at: http://mysqlconf.blip.tv/file/2037101 · SmugMug: o Bootstrapped in '02 and still operating without external funding o Profitable and without debt o Top 400 website o Doubling yearly
Thu, 28 May 2009 12:45:07 UTC
I've brought together links to select past postings and posted them to: http://mvdirona.com/jrh/AboutPerspectives/. It's linked to the blog front page off the “about” link. I'll add to this list over time. If there is a Perspectives article not included that you think should be, add a comment or send me email.
Select Past Perspectives Postings
Server under 30W
Two years ago I met with the leaders of the newly formed Dell Data Center Solutions team and they explained they were going to invest deeply in R&D to meet the needs of very high scale data center solutions. Essentially Dell was going to invest in R&D for a fairly narrow market segment. “Yeah, right” was my first thought but I've been increasingly impressed since then. Dell is doing very good work and the announcement of Fortuna this week is worthy of mention. Fortuna, the Dell XS11-VX8, is an innovative server design. I actually like the name as proof that the DCS team is an engineering group rather than a marketing team.
Amazon Web Services Inport/Export
Cloud services provide excellent value but it's easy to underestimate the challenge of getting large quantities of data to the cloud. When moving very large quantities of data, even the fastest networks are surprisingly slow. And, many companies have incredibly slow internet connections. Back in 1996 MInix author and networking expert, Andrew Tanenbaum said “Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway”. For large data transfers, it's faster (and often cheaper) to write to local media and ship the media via courier. This morning the Beta release Amazon Web Services Import/Export was announced.
High-Scale Service Server Counts
From an interesting article in Data Center Knowledge Who has the Most Web Servers: 1&1 Internet: 55,000 servers (company) OVH: 55,000 servers (company) Rackspace: 50,038 servers (company) The Planet: 48,500 servers (company) Akamai Technologies: 48,000 servers (company) SBC Communications: 29,193 servers (Netcraft)
1999 Mitsubishi 3000 VR4 For Sale
Our 1999 Mitsubishi 3000 VR4 For Sale. Black-on-black with 80,000 miles. $12,500 OBO. Fewer than 300 1999 VR-4s were produced for North America, and only 101 in black-on-black. We love this car and hate to sell it, but are living downtown Seattle and no longer need a car. It's a beautiful machine, 320 HP, and handles incredibly well. We're often stopped on the street asking if we would sell it, and now we are. Details and pictures at: http://www.mvdirona.com/somerset/vr4.html. Our house in Bellevue is for sale as well: 4509 Somerset Pl SE, Bellevue, Wa.
AWS Ships Monitoring, Auto Scaling, & Elastic Load Balancing
Earlier this morning Amazon Web Services announced the public beta of Amazon Cloudwatch, Auto Scaling, and Elastic Load Balancing. Amazon Cloudwatch is a web service for monitoring AWS resources. Auto Scaling automatically grows and shrinks Elastic Compute Cloud resources based upon demand. Elastic Load Balancing distributed workload over a fleet of EC2 servers. Amazon CloudWatch ¿ Amazon CloudWatch is a web service that provides monitoring for AWS cloud resources, starting with Amazon EC2. It provides you with visibility into resource utilization, operational performance, and overall demand patterns–including metrics such as CPU utilization, disk reads and writes, and network traffic.
The Datacenter as a Computer
A couple of weeks back, a mini-book by Luiz André Barroso and Urs Hölzle of the Google infrastructure team was released. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines is just over 100 pages long but an excellent introduction into very high scale computing and the issues important at scale. From the Abstract: As computation continues to move into the cloud, the computing platform of interest no longer resembles a pizza box or a refrigerator, but a warehouse full of computers. These new large datacenters are quite different from traditional hosting facilities of earlier times and cannot be viewed simply as a collection of co-located servers.
Next Point of Server Differentiation: Effiiciency at Very High Temprature
High data center temperatures is the next frontier for server competition (see pages 16 through 22 of my Data Center Efficiency Best Practices talk: http://mvdirona.com/jrh/TalksAndPapers/JamesHamilton_Google2009.pdf and 32C (90F) in the Data Center). At higher temperatures the difference between good and sloppy mechanical designs are much more pronounced and need to be a purchasing criteria. The infrastructure efficiency gains of running at higher temperatures are obvious. In a typical data center 1/3 of the power arriving at the property line is consumed by cooling systems. Large operational expenses can be avoided by raising the temperature set point. In most climates raising data center set points to the 95F range will allow a facility to move to a pure air-side economizer configuration eliminating 10% to 15% of the overall capital expense with the later number being the more typical.
Bio-IT World Keynote
Chris Dagdigian of BioTeam presented the keynote at this year's Bio-IT World Conference. I found this presentation interesting for at least two reasons: 1) it's a very broad and well reasoned look at many of the issues in computational science and, 2) an innovative example of cloud computing is presented where BioTeam and Pfizer implement protein docking using Amazon AWS.
Google Dalles Air Side Economizer Picture
In the Randy Katz on High Scale Data Centers posting I the article brought up Google Dalles. The article reported that Dalles used air side economization but I've not seen the large intakes or louvers I would expect from a facility of that scale. Cary Roberts, ex-TellMe Networks and all around smart guy, produced a picture of Google Dalles that clearly shows air side economization (Thanks Cary). James Hamilton, Amazon Web Services 1200, 12th Ave.
Costs of Higher Temperature Data Centers?
Earlier this week I got a thought provoking comment from Rick Cockrell in response to the posting: 32C (90F) in the Data Center. I found the points raised interesting and worthy of more general discussion so I pulled the thread out from the comments into a separate blog entry. Rick posted: Guys, to be honest I am in the HVAC industry. Now, what the Intel study told us is that yes this way of cooling could cut energy use, but what is also said is that there was more than a 100% increase in server component failure in 8 months (2.45% to 4.46%) over the control study with cooling...
Randy Katz on High Scale Data Centers
This IEEE Spectrum article was published in February but I've been busy and haven't had a chance to blog it. The author, Randy Katz, is a UC Berkeley researcher and member of the Reliable Available Distributed Systems Lab. Katz was a coauthor on the recently published RAD Lab article on Cloud Computing: Berkeley Above the Clouds. The IEEE Spectrum article focuses on data center infrastructure: Tech Titans Building Boom. In this article Katz, looks at the Google, Microsoft, Amazon, and Yahoo data center building boom. Some highlights from my read: · Microsoft Quincy is 48MW total load with 48,600 sq m of space. 4.8 km of chiller pipe, 965 km of electrical wire, 92,900 m2 of drywall, and 1.5 metric tons of backup batteries.
McKinsey Speculates that Cloud Computing May Be More Expensive than Internal IT
I'm always interested in research on cloud service efficiency, and last week, at the Uptime Institute IT Symposium in New York City, management consultancy McKinsey published a report entitled Clearing the air on Cloud Computing. McKinsey is a well respected professional services company that describes itself as “a management consulting firm advising leading companies on organization, technology, and operations”. Over the first 22 years of my career in server-side computing at Microsoft and IBM, I've met McKinsey consultants frequently, although they were typically working on management issues and organizational design rather than technology. This particular report focuses more on technology, where the authors investigate the economics of very high scale data centers and cloud computing.
SSD versus Enterprise SATA and SAS disks
In Where SSDs Don't Make Sense in Server Applications, we looked at the results of a HDD to SSD comparison test done by the Microsoft Cambridge Research team. Vijay Rao of AMD recently sent me a pointer to an excellent comparison test done by AnandTech. In SSD versus Enterprise SAS and SATA disks, Anandtech compares one of my favorite SSDs, the Intel X25-E SLC 64GB, with a couple of good HDDs. The Intel SSD can deliver 7000 random IOPS/sec and the 64GB component is priced in the $800 range. The full AnandTech comparison is worth reading but I found the pricing with sequential and random I/O performance data is particularly interesting.
Under the Covers of Google App Engine Datastore
My notes from an older talk done by Ryan Barrett on the Google App Engine Data store at Google IO last year (5/28/2008). Ryan is a co-founder of the App Engine team. · App Engine Data Store is build on Big Table. o Scalable structured storage o Not a sharded database o Not an RDBMS (MySQL, Oracle, etc.) o Not a Distributed Hash Table (DHT) o It IS a sharded sorted array ·
Where SSDs Don't Make Sense in Server Applications
All new technologies go through an early phase when everyone initially is completely convinced the technology can't work. Then for those that actually do solve interesting problems, they get adopted in some workloads and head into the next phase. In the next phase, people see the technology actually works well for some workloads and they generalize this outcome to a wider class of workloads. They get convinced the new technology is the solution for all problems. Solid State Disks (SSDs) are now clearly in this next phase. Well intentioned people are arguing emphatically that SSDs are great because they are “fast”. For the most part, SSDs actually are faster than disks both in random reads, random writes and sequential I/O.
Data Center Efficiency Summit Videos Posted
Last week I attended the Data Center Efficiency Summit hosted by Google. You'll find four posting on various aspects of the summit at: http://perspectives.mvdirona.com/2009/04/05/DataCenterEfficiencySummitPosting4.aspx. Two of the most interesting videos: · Modular Data Center Tour: http://www.youtube.com/watch?v=zRwPSFpLX8I&feature=channel · Data Center Water Treatment Plant: http://www.youtube.com/watch?v=nPjZvFuUKN8&feature=channel A Cnet article with links to all the videos: http://news.cnet.com/8301-1001_3-10215392-92.html?tag=newsEditorsPicksArea.0. The presentation I did on Data Center Efficiency Best Practices is up at: http://www.youtube.com/watch?v=m03vdyCuWS0
32C (90F) in the Data Center
In the talk I gave at the Efficient Data Center Summit, I note that the hottest place on earth over recorded history was Al Aziziyah Libya in 1922 where 136F (58C) was indicated (see Data Center Efficiency Summit (Posting #4)). What's important about this observation from a data center perspective is that this most extreme temperature event ever, is still less than the specified maximum temperatures for processors, disks, and memory. What that means is that, with sufficient air flow, outside air without chillers could be used to cool all components in the system. Essentially, it's a mechanical design problem. Admittedly this example is extreme but it forces us to realize that 100% free air cooling possible. Once we understand that it's a mechanical design problem, then we can trade off the huge savings of higher temperatures against the increased power consumption (semiconductor ...
Data Center Efficiency Summit (Posting #4)
Last week, Google hosted the Data Center Efficiency Summit. While there, I posted a couple of short blog entries with my rough notes: · Data Center Efficiency Summit · Rough Notes: Data Center Efficiency Summit · Rough Notes: Data Center Efficiency Summit (posting #3) In what follows, I summarize the session I presented and go into more depth on some of what I saw in sessions over the course of the day. I presented Data Center Efficiency Best Practices at the 1pm session. My basic point was that PUEs in the 1.35 range are possible and attainable without substantial complexity and without innovation. Good solid design, using current techniques, with careful execution is sufficient to achieve this level of efficiency.
HotPower '09 Call for Papers
The HotPower '09 workshop will be held on October 10th at the same venue and right before the Symposium on Operating Systems Principles (SOSP 2009) at Big Sky Resort Montana. Hotpower recognizes that power is becoming a central issue in the design of all systems from embedded systems to servers for high-scale data centers. From http://hotpower09.stanford.edu/: Power is increasingly becoming a central issue in designing systems, from embedded systems to data centers. We do not understand energy and its tradeoff with performance and other metrics very well. This limits our ability to further extend the performance envelope without violating physical constraints related to batteries, power, heat generation, or cooling.
Rough Notes: Data Center Efficiency Summit (posting #3)
Previous “rough notes” posting: Rough Notes: Data Center Efficiency Summit. Containers Based Data Center · Speaker: Jimmy Clidaras · 45 containers (222KW each/max is 250Kw ¿ 780W/sq ft) · Showed pictures of containerized data centers · 300x250' of container hanger · 10MW facility · Water side economizer ·
Rough Notes: Data Center Efficiency Summit
My rough notes from the first two sessions at the Data Center Efficiency Summit at Google Mountain view earlier today: Data Center Energy Going Forward · Speaker: John Tuccillo, APC · Green Grid: o Data Collection & Analysis o Data Center Technology & Strategy o Data Center Operations o Data Center Metrics & Measurements ·
Efficient Data Center Summit
Google is hosting a the Efficient Data Center summit today at their Mountain View facility. It looks like its going to be as great event and I fully expect we'll see more detail than ever on how high scale operators run their facilities. But, in addition, one of the goals of the event is to talk about what the industry as a whole can do to increase data center efficiency. It looks like a good event.
Grand Challenges in Database Self-Management
I participated in the Self Managing Database Systems closing panel titled Grand Challenges in Database Self-Management.
Cloud Computing Economies of Scale
Today , I'm at the Self Managing Database Systems workshop which is part of the International Conference on Data Engineering in Shanghai. At last year's ICDE, I participated in a panel: International Conference on Data Engineering 2008. Earlier today, I did the SMDB keynote where I presented: Cloud Computing Economies of Scale. The key points I attempted to make were: · Utility (Cloud) computing will be a big part of the future of server-side systems. This is a lasting and fast growing economy with clear economic gains. These workloads are already substantial and growing incredibly fast.
Microsoft Live Search is Kumo
There has been lots of speculation about the new name for Microsoft Search. The most prevalent speculation is that Live.com will be branded Kumo: Microsoft to Rebrand Search. Will it be Kumo?< ?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /> Confirming that the Kumo brand is definitely the name that is being tested internally at Microsoft, I've noticed over the last week that the Search Engine Referral URL www.kumo.com has been showing up frequently as the source for searches that find this blog. I suppose the brand could be changed yet again as the Microsoft internal bits are released externally.
Erlang Productivity and Performance
Over the last couple of years, I've been getting more interested in Erlang as an high-scale services implementation language originally designed at Ericcson. Back in May of last year I posted: Erlang and High-Scale System Software. The Erlang model of spawning many lightweight threads that communicate via message passing is typically less efficient than the more common shared memory and locks approach but the lightweight processes with message passing model but it is much easier to get a correct implementation using this model. Erlang also encourages a “fail fast” programming model. Years ago I became convinced that this design pattern is one of the best ways to get high scale systems software correct (Designing and Deploying Internet-Scale Services). Chris Newcombe of Amazon recently presented an excellent talk on Erlang ...
40C (104F) in the Data Center
From Data Center Knowledge yesterday: Rackable Turns up the Heat, we see the beginnings of the next class of server innovations. This one is going to be important and have lasting impact. The industry will save millions of dollars and megawatts of power ignoring the capital expense reductions possible. Hat's off to Rackable Systems to being the first to deliver. Yesterday they announced the CloudRack C2. CloudRack is very similar to the MicroSlice offering I mentioned in the Microslice Servers posting. These are very low cost, high efficiency and high density, server offerings targeting high scale services.
Workshop on Hot Topics in Cloud Computing (HotCloud '09)
HotCloud '09 is a workshop that will be held at the same time as USENIX '09 (June 14 through 19, 2009). The CFP: Join us in San Diego, CA, June 15, 2009, for the Workshop on Hot Topics in Cloud Computing. HotCloud '09 seeks to discuss challenges in the Cloud Computing paradigm including the design, implementation, and deployment of virtualized clouds. The workshop provides a forum for academics as well as practitioners in the field to share their experience, leverage each other's perspectives, and identify new and emerging "hot" trends in this area.
Heterogeneous Computing using GPGPUs: AMD/ATI RV770
This the third posting in the series on heterogeneous computing. The first two were: 1. Heterogeneous Computing using GPGPUs and FPGAs 2. Heterogeneous Computing using GPGPUs: NVidia GT200 This post looks more deeply at the AMD/ATI RV770. The latest GPU from AMD/ATI is the RV770 architecture. The processor contains 10 SIMD cores, each with 16 streaming processor (SP) units. The SIMD cores are similar to NVidia's Texture Processor Cluster (TPC) units (the NVidia GT200 also has 10 of these), and the 10*16 = 160 SPs are “execution thread granularity” similar to NVidia's SP units (GT200 has 240 of these). Unlike NVidia's design which executes 1 instruction per thread, each SP on the RV770 executes packed 5-wide VLIW-style instructions. For graphics and visualization workloads, floating point intensity is high enough to average about 4.2 useful operations per ...
Virtualizing GPUs
In the last posting, Heterogeneous Computing using GPGPUs: NVidia GT200 I promised the next post would be a follow-on look at the AMD/ATI RV770. However, over the weekend, Niraj Tolia of HP Labs sent this my way as a follow-up on the set of articles on GPGPU Programming. Prior to reading this note, I hadn't really been interested in virtualizing GPUs but the paper caught my interest and, I'm posting my notes on it just ahead of the RV770 architectural review that I'll get up later in the week. The paper GViM: GPU-accelerated Virtual Machines tackles the problem of implementing GPGPU programming in a virtual machine environment.
Heterogeneous Computing using GPGPUs: NVidia GT200
In Heterogeneous Computing using GPGPUs and FPGAs I looked at the Heterogeneous computing, the application of multiple instruction set architectures within a single application program under direct programmer control. Heterogeneous computing has been around for years but usage has been restricted to fairly small niches. I'm predicting that we're going to see abrupt and steep growth over the next couple of years. The combination of delivering results for many workloads cheaper, faster, and more power efficiently coupled with improved programming tools is going to vault GPGPU programming into being a much more common technique available to everyone.
Heterogeneous Computing using GPGPUs and FPGAs
It's not at all uncommon to have several different instruction sets employed in a single computer. Decades ago IBM mainframes had I/O processing systems (channel processors). Most client systems have dedicated graphics processors. Many networking cards off-load the transport stack (TCP/IP off load). These are all examples of special purpose processors used to support general computation. The application programmer doesn't directly write code for them. I define Heterogeneous computing as the application of processors with different instruction set architectures (ISA) under direct application programmer control. Even heterogeneous processing has been around for years in that application programs have long had access to dedicated floating point coprocessors with instructions not found on the main CPU.
Navigating the Linux Kernel
Google Maps is a wonderfully useful tool for finding locations around town or around the globe. Microsoft Live Labs Seadragon is a developer tool-kit for navigating wall-sized or larger displays using pan and zoom. Here's the same basic tiled picture display technique (different implementation) applied to navigating the Linux kernel: Linux Kernel Map. The kernel map has a component-by-component breakdown of the entire Linux kernel for hardware interfaces up to user space system calls and most of what is in between. And it's all navigatable using zoom and pan. I'm not sure what I would actually use the kernel map for but it's kind of cool. If you could graphically zoom from the map to the source it might actually be a useful day-to-day tool rather than one a one-time thing.
Seattle Cloud Camp
Febuary 28th, Cloud Camp Seattle was held at an Amazon facility in Seattle. Cloud Camp is described organizers as an unconference where early adapters of Cloud Computing technologies exchange ideas. With the rapid change occurring in the industry, we need a place we can meet to share our experiences, challenges and solutions. At CloudCamp, you are encouraged you to share your thoughts in several open discussions, as we strive for the advancement of Cloud Computing. End users, IT professionals and vendors are all encouraged to participate.< ?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /> The Cloud Camp schedule is at: http://www.cloudcamp.com/.
It's not (just) about performance
Whenever I see a huge performance number without the denominator, I shake my head. It's easy to get a big performance number on almost any dimension but what is far more difficult is getting a great work done per dollar. Performance alone is not interesting. I'm super interested in flash SSDs and see great potential for SSDs in both client and server-side systems. But, our industry is somewhat hype driven. When I first started working with SSDs and their application to server workloads, many thought it was a crazy ideas pointing out that the write rates were poor and they would wear out in days. The former has been fixed in Gen 2 devices and later was never true. Now SSDs are climbing up the hype meter and I find myself arguing on the other side: they don't solve all problems.
Cost of a Cloud: Research Problems in Data Center Networks
In the current ACM SIGCOMM Computer Communications Review, there is an article on data center networking, Cost of a Cloud: Research Problems in Data Center Networks by Albert Greenberg, David Maltz, Parveen Patel, and myself. Abstract: The data centers used to create cloud services represent a significant investment in capital outlay and ongoing costs. Accordingly, we first examine the costs of cloud service data centers today. The cost breakdown reveals the importance of optimizing work completed per dollar invested. Unfortunately, the resources inside the data centers often operate at low utilization due to resource stranding and fragmentation.
Changes in the Amazon and Microsoft Clouds Yesterday
Yesterday Amazon Web Services announced availability of Windows and SQL Server under Elastic Compute Cloud (EC2) in the European region. Running in the EU is important for workloads that need to be near customers in that region or workloads that operate on data that needs to stay in region. The AWS Management Console has been extended to support EC2 in the EU region. The management council supports administration of Linux, Unix, and Windows systems under Elastic Compute Cloud as well as management of Elastic Block Store and Elastic IP. More details up at: http://aws.amazon.com/about-aws/whats-new/2009/03/03/amazon-ec2-running-windows-in-eu-region/. < ?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /> Also yesterday, Microsoft confirmed Windows Azure Cloud Software Set for Release This Year.
WTIA: Scaling into the Cloud with Amazon Web Services
Earlier this evening I attended the Washington Technology Industry Association event Scaling into the Cloud with Amazon Web Services. Adam Selipsky, VP of Amazon Web Services gave an overview of AWS and was followed by two AWS customers each of which talked about their services and how they use AWS. My rough notes follow. Adam Selipsky, VP Amazon Web Services · 490k registered developers · Amazon is primarily a technology company.
FriendFeed use of MySQL
I collect postings on high-scale service architectures, scaling war stories, and interesting implementation techniques. For past postings see Scaling Web Sites and Scaling LinkedIn. Last week Bret Taylor posted an interesting description of the FriendFeed backend storage architecture: How FriendFeed uses MySQL to store Schema-less Data. Friendfeed faces a subset of what I've called the hairball problem. The short description of this issue is social networking sites need to be able to access per-user information both by user and also by searching for the data to find the users. For example, group membership.
Service Design Best Practices
Yesterday I presented Service Design Bets Practices at an internal Amazon talk series called Principals of Amazon.
Google App Engine
Google has announced that the App Engine free quota resources will be reduced and pricing has been announced for greater-than-free tier usage. The reduction in free tier will be effective 90 days after the February 24th announcement and reduces CPU and bandwidth allocations by the following amounts: · CPU time free tier reduced to 6.4 hours/day from 46 hours/day · Bandwidth free tier reduced to 1 GB/day from 10 GB/day Also announced February 24th is the charge structure for usage beyond the free-tier: $0.10 per CPU core hour.
FAST 2009 Keynote: AWS S3
Building Scalable Web Apps with Google App Engine
Key Value Stores
High Performance Transaction Systems 2009
Back in the early 90's I attended High Performance Transactions Systems for the first time. I loved it. It's on the ocean just south of Monterey and some of the best in both industry and academia show up to attend the small, single tracked conference. It's invitational and kept small so it can be interactive. There are lots of discussions during the sessions, everyone eats together, and debates & discussions rage into the night. It's great. The conference was originally created by Jim Gray and friends with a goal to break the 1,000 transaction/second barrier.
AFCOM Western Washington Chapter Meeting
Earlier today I presented Where Does the Power Go and What to do About it at the Western Washington Chapter of AFCOM. I basically presented the work I wrote up in the CIDR paper: The Case for Low-Cost, Low-Power Servers. The slides are at: JamesHamilton_AFCOM2009.pdf (1.22 MB). The general thesis of the talk is that improving data center efficiency by a factor of 4 to 5 is well within reach without substantial innovation or design risk. --jrh James Hamilton, Amazon Web Services 1200, 12th Ave.
Service Billing is Hard
Service billing is hard. It's hard to get invoicing and settlement overheads low. And billing is often one of the last and least thought of components of a for-fee online service systems. Billing at low overhead and high scale takes engineering and this often doesn't get attention until after the service beta period. During a service beta period, you really don't want to be only working out the service kinks. If you have a for-fee service or up-sell, then you should be beta testing the billing system and the business model at the same time as you beta test the service itself.
Berkeley Above the Clouds
Another Step Forward for Utility Computing
Yesterday, IBM announced it is offering access to IBM Software in the Amazon Web Services Cloud. IBM products now offered for use in the Amazon EC2 environment include: DB2 Express-C 9.5 Informix Dynamic Server Developer Edition 11.5 WebSphere Portal Server and Lotus Web Content Management Standard Edition WebSphere sMash The IBM approach to utility computing offers considerable licensing flexibility with three models: 1) Development AMIs (Amazon Machine Image), 2) Production AMIs, and 3) Bring your own license. Development AMIs are available today for testing, education, development, demonstration, and other non-commercial uses. Development AMIs are available from IBM today at no cost beyond the standard Amazon EC2 charges.
Most DoS Attacks are Friendly Fire
Over the years, I've noticed that most DoS attacks are actually friendly fire. Many times I've gotten calls from our Ops Manager saying the X data center is under heavy attack and we're rerouting traffic to the Y DC only later to learn that the “attack” was actually a mistake on our end. There is no question that there are bad guys out there sourcing attacks but internal sources of network overrun are far more common. Yesterday, kdawson posted a wonderful example on Slashdot from Source Forge Chief Network Engineer Uriah Welcome titled “from the disturbances in the fabric department”:http://news.slashdot.org/article.pl?sid=09/02/10/044221.
Microsoft Delays Chicago, Dublin, and Des Moines Data Centers
Microsoft has announced the delay of Chicago and Dublin earlier this week (Microsoft will open Dublin and Chicago Data Centers as Customer Demand Warrants. A few weeks ago the Des Moines data center delay was announced (http://www.canadianbusiness.com/markets/market_news/article.jsp?content=D95T2TRG0). Arne Josefsberg and Mike Manos announced these delays in there Building a Better Mousetrap, a.k.a. Optimizing for Maximum efficiency in an Economic Downturn blog posting. This is a good, fiscally responsible decision given the current tough economic conditions. It's the right time to be slowing down infrastructure investments. But, what surprises me is the breadth of the spread between planned expansion and the currently expected Microsoft internal demand. That's at least surprising and bordering on amazing.
Facebook Cassandra Architecture and Design
Last July, Facebook released Cassandra to open source under the Apache license: Facebook Releases Cassandra as Open Source. Facebook uses Cassandra as email search system where, as of last summer, they had 25TB and over 100m mailboxes. This video gets into more detail on the architecture and design: http://www.new.facebook.com/video/video.php?v=540974400803#/video/video.php?v=540974400803. My notes are below if you don't feel like watching the video.
Slides from Conference on Innovative Data Systems Research
I did the final day keynote at the Conference on Innovative Data Systems Research earlier this month. The slide deck is based upon the CEMS paper: The Case for Low-Cost, Low-Power Servers but it also included a couple of techniques I've talked about before that I think are super useful: · Power Load Management: The basic idea is to oversell power, the most valuable resource in a data center. Just as airlines oversell seats, there revenue producing asset. Rather than taking the data center critical power (total power less power distribution losses and mechanical loads) and then risking it down by 10 to 20% to play it safe since utility over-draw brings high cost.
Storage at 2TB for $250
Wow, 2TB for $250 from Western Digital: http://www.engadget.com/2009/01/26/western-digitals-2tb-caviar-green-hdd-on-sale-in-australia/.
Low Power Amdahl Blades for Data Intensive Computing
In Microslice Servers and the Case for Low-Cost, Low-Power Servers, I observed that CPU bandwidth is outstripping memory bandwidth. Server designers can address this by: 1) designing better memory subsystems or 2) reducing the CPU per-server. Optimizing for work done per dollar and work done per joule argues strongly for the second approach for many workloads. In Low Power Amdahl Blades for Data Intensive Computing (Amdahl Blades-V3.pdf (84.25 KB)), Alex Szalay makes a related observation and arrives at a similar point. He argues that server I/O requirements for data intensive computing clusters grow in proportion to CPU performance. As per-server CPU performance continues to increase, we need to add additional I/O capability to each server. We can add more disks but this drives up both power and cost as more disk require more I/O channels.
Microslice Servers
In The Case For Low-Power Servers I reviewed the Cooperative, Expendable, Micro-slice Servers project. CEMS is a project I had been doing in my spare time in investigating using low-power, low costs servers running internet-scale workloads. The core premise of the CEMS project: 1) servers are out-of-balance, 2) client and embedded volumes, and 3) performance is the wrong metric. Out-of-Balance Servers: The key point is that CPU bandwidth is increasing far faster than memory bandwidth (see page 7 of Internet-Scale Service Efficiency). CPU performance continues to improve at roughly historic rates. Core count increases have replaced the previous reliance on frequency increase but performance improvements continue unabated. As a consequence, CPU performance is outstripping memory bandwidth with the result that more and more cycles are spent in pipeline stalls.
Recardo Hermann's Snippets on Software
The Case For Low-Cost, Low-Power Servers
The Conference on Innovative Data Systems Research was held last week at Asilomar California. It's a biennial systems conference. At the last CIDR, two years ago, I wrote up Architecture for Modular Data Centers where I argued that containerized data centers are an excellent way to increase the pace of innovation in data center power and mechanical systems and are also a good way to grow data centers more cost effectively with a smaller increment of growth. Containers have both supporters and detractors and its probably fair to say that the jury is still out. I'm not stuck on containers as the only solution but any approach that supports smooth, incremental data center expansion is interesting to me.
Amazon Web Services & Windows Live Mesh at Crunchies
Last night, TechCrunch hosted The Crunchies and two of my favorite services got awards. Ray Ozzie and David Treadwell accepted Best Technology Innovation/Achievement for Windows Live Mesh. Amazon CTO Werner Vogels accepted Best Enterprise Startup for Amazon Web Services.
Joel Spolsky: 12 Steps to Better Code
Google's Will Power and Data Center Efficiency
Earlier in the week, there was an EE Times posting, Server Makers get Googled, and a follow-up post from Gigaom How Google Is Influencing Server Design. I've long been an advocate of making industry leading server designs more available to smaller data center operators since, in aggregate, they are bigger power consumers and have more leverage as a group. The key design aspects brought up in these two articles: · Higher data center temperatures · 12V-only power supplies · Two servers on a board An early article from The Register back in October, Google Demanding Intel's Hottest Chips sourced a ex-Google employee that clearly wasn't involved with Google's data center or server design teams.
CACM Interview with Pat Selinger
In a previous posting, Pat Selinger IBM Ph.D. Fellowships, I mentioned Pat Selinger as one of the greats of the relational database world. Working with Pat was one of the reasons why leaving IBM back in the mid-90's was a tough decision for me. In the December 2008 edition of the Communications of the ACM, an interview I did with Pat back in 2005 is published: Database Dialogue with Pat Selinger. It originally went out as an ACM Queue article. If you haven't checked out the CACM recently, you should. The new format is excellent and the articles are now worth reading.
Wikipedia Architecture
MySpace Architecture and .Net
From Viraj Mody of the Microsoft Live Mesh team sent this my way: Dan Farino About MySpace Architecture. MySpace, like Facebook, uses relational DBs extensively front-ended by a layer of Memcached servers. Less open source at MySpace but otherwise unsurprising ¿ a nice scalable design with 3000 front end servers with well over 100 database servers (1m users per DB server).
Bill Gates
High Efficiency SATA Storage
Related to The Cost of Bulk Storage posting, Mike Neil dropped me a note. He's built an array based upon this Western Digital part: http://www.wdc.com/en/products/Products.asp?DriveID=336. Its unusually power efficient: Power Dissipation I wrote this blog entry a few weeks ago before my recent job change. It's a look at the cost of high-scale storage and how it has fallen over the last two years based upon the annual fully burdened cost of power in a data center and industry disk costs trends. The observations made in this post are based upon understanding these driving costs and should model any efficient, high-scale bulk storage farm. But, and I need to be clear on this, it was written prior to my joining AWS and there is no information below relating to any discussions I've had with AWS or how the AWS team specifically designs, deploys, or manages their storage farm.
The Cost of Bulk Cold Storage
Resource Consumption Shaping
Resource Consumption Shaping is an idea that Dave Treadwell and I came up with last year. The core observation is that service resource consumption is cyclical. We typically pay for near peak consumption and yet frequently are consuming far below this peak. For example, network egress is typically charged at the 95th percentile of peak consumption over a month and yet the real consumption is highly sinusoidal and frequently far below this charged for rate. Substantial savings can be realized by smoothing the resource consumption. Looking at the network egress traffic report below, we can see this prototypical resource consumption pattern: You can see from the chart above that resource consumption over the course of a day varies by more than a factor of two.
Google MapReduce Wins TeraSort
Large sorts need to be done daily and doing it well actually is economically relevant. Last July, Owen O'Malley of the Yahoo Grid team announced they had achieved a 209 second TeraSort run: Apache Hadoop Wins Terabyte Sort Benchmark. My summary of the Yahoo result with cluster configuration: Hadoop Wins TeraSort. Google just announced a MapReduce sort result on the same benchmark: Sorting 1PB with MapReduce. They improved on the 209 second result that Yahoo produced achieving 68 seconds. How did they get roughly 3x speedup? Google used slightly more servers at 1,000 than the 910 used by Hadoop but that difference is essentially rounding error and doesn't explain the difference. We know that sorting is essentially, an I/O problem.
Slides from Tony Hoare's Talk
Two weeks ago I posted the notes I took from Tony Hoare's “The Science of Programming” talk at the Computing in the 21st Century Conference in Beijing. Here's are the slides from the original talk: Tony Hoare Science of Programming (199 KB). Here are my notes from two weeks back: Tony Hoare on The Science of Programming.
Pat Selinger IBM Ph.D. Fellowship
Last week, IBM honored database giant Pat Selinger by creating a Ph.D. Fellowship in her name. I worked with Pat closely for many years at IBM and much of what I learned about database management systems was from Pat during those years. She was a one of the original members of the IBM System R team and is probably best known as the inventor of the cost based optimizer. Access Path Selection in a Relational Database Management Systems is a paper from that period that I particularly enjoyed. From the IBM press release: Pat Selinger IBM Ph.D.
Intel's Solid State Drives
Intel Fellow and Director of Storage Architecture Knut Grimsrud presented at WinHEC 2008 last week and it caught my interest for several reasons: 1) he talked about Intel findings with their new SSD which looks like an extremely interesting price/performer, 2) they have found interesting power savings in their SSD experiments beyond the easy to predict reduction in power consumption of SSDs over HDDs, and 3) Knut presented a list of useful SSD usage do's and don'ts. Starting from the best practices: · DO queue requests to SSD as deeply as possible SSD has massive internal parallelism and generally is underutilized.
Mesh Services Architecture and Concepts
Abolade Gbadegesin, Windows Live Mesh Architect gave a great talk at the Microsoft Professional Developers Conference on Windows Live Mesh (talk video, talk slides). Live mesh is a service that supports p2p file sharing amongst your devices, file storage in the cloud, remote access to all your devices (through firewalls and NATS), and web access to those files you chose to store in the cloud. Live Mesh is a good service and worth investigating in its own right but what makes this talk particularly interesting is Abolade gets into the architecture of how the system is written and, in many cases, why it is designed that way. I've been advocating redundant, partitioned, fail fast service designs based upon Recovery Oriented Computing for years. For example, Designing and Deploying Internet Scale Services (paper, slides).
The Uses of Computers: What's Past is Merely Prologue -- Butler Lampson
Butler Lampson, one of the founding members of Xerox PARC, Turing award winner, and one of the most practical engineering thinkers I know spoke a couple of days ago at the Computing in the 21st Century Conference in Beijing. My rough notes from Butler's talk follow. Overall Butler argues that “embodiment” is the next big phase of computing after simulation and communications. Butler defines embodiment as computers interacting directly with the physical world. For example, autonomously driven vehicles. Butler argues that this class of applications are only possible now due to the rapidly falling price of computing coupled with systems capabilities driven by Moore's law.
Tony Hoare on The Science of Programming
Tony Hoare spoke yesterday at the Computing in the 21st Century Conference in Beijing. Tony is a Turing award winner, Quicksort inventor, author of the influential Communication Sequential Processes (CSP) formal language, and long time advocate of program verification and tools to help produce reliable software systems. In his talk he argues that programming should be and can be a science and the goals should be correct programs that stay correct through change. Zero defect software. He explains that engineers will accept that there will be defects but the scientist should pursue perfection far beyond that for which there is a commercial need.
Monitoring at Scale
Service monitoring at scale is incredibly hard. I've long argued that you should never learn anything about a problem your service is experiencing from a customer. How could they possibly know first when there is a service outage or issue? And, yet it happens frequently. The reason it happens is most sites don't have close to an adequate level of instrumentation. Without this instrumentation, you are flying blind. Systems monitoring data can be used to drive alerts, to compute SLAs, to drive capacity planning, to find latencies, to understand customer access patterns, and some sites use it to drive billing although the later is probably a mistake.
When SSDs Make Sense in Client Applications
In When SSDs Make Sense in Server Applications, we looked at where Solid State Drives (SSDs) were practical in servers and services. On the client side, there are even more reasons to use SSDs and I expect that within three years, more than half of enterprise laptops will have NAND Flash as at least part of their storage subsystems. This estimate has SSDs in 38% of all laptops by 2011: Flash SSD in 38% of Laptops by 2011. What follows is a quick summary of SSD advantages on the client side, followed by the disadvantages, and then a closer look at the write endurance (wear-out) problem that has been the topic of much discussion recently.
When SSDs Make Sense in Server Applications
In past posts, I've talked a lot about Solid State Drives. I've mostly discussed about why they are going to be relevant on the server side and the shortest form of the argument is based on extremely hot online transaction processing systems (OLTP). There are potential applications as reliable boot disks in blade servers and other small data applications but I'm focused on high-scale OLTP in this discussion. OLTP applications random I/O bound workloads such as ecommerce systems, airline reservation systems, and any data intensive application that does lots of small reads and writes, usually on a database where future access patterns are unknown.
Hotnets 2008 Paper
Albert Greenberg and I missed Hotnets 2008 last week due to a conflicting meeting down in California but Ken Church was there to present our On Delivering Embarrassingly Distributed Cloud Services paper. I summarized the paper in a recent blog entry: Embarrassingly Distributed Cloud Services and the abstract from the paper follows: Very large data centers are very expensive (servers, power/cooling, networking, physical plant.) Newer, geo-diverse, distributed or containerized designs offer a more economical alternative. We argue that a significant portion of cloud services are embarrassingly distributed ¿ meaning there are high performance realizations that do not require massive internal communication among large server pools.
A Small Window into Google's Data Centers
Google has long enjoyed a reputation for running efficient data centers. I suspect this reputation is largely deserved but, since it has been completely shrouded in secrecy, that's largely been a guess built upon respect for the folks working on the infrastructure team rather than anything that's been published. However, some of the shroud of secrecy was lifted last week and a few interesting tidbits were released in Google Commitment to Sustainable Computing. On server design (Efficient Servers), the paper documents the use of high-efficiency power supplies and voltage regulators, and the removal of components not relevant in a service-targeted server design. A key point is the use of efficient, variable-speed fans. I've seen servers that spend as much as 60W driving the fans alone.
Measurement and Analysis of Large-Scale Network File System Workloads
An interesting file system study is at this year's USENIX Annual Technical Conference. The paper Measurement and Analysis of Large-Scale Network File System Workloads looks at CIFS remote file system access patterns from two populations. The first a large file store of 19TB serving 500 software developers and the second a medium sized file store of 3TB used by 1,000 marketing, sales, and finance users. The authors found that file access patterns have changed since previous studies and offer 10 observations: · Both workloads are more write-heavy than workloads studied previously · Read-write [rather than pure read or pure write] access patterns are much frequent compared to past studies · Bytes are transferred in much longer sequential runs than in previous studies [the lengths of sequential runs is increasing but note that the percentage of random access is ...
Embarrasingly Distributed Cloud Services
Ken Church, Albert Greenberg, and I just finished On Delivering Embarrassingly Distributed Cloud Services which has been accepted for presentation at ACM Hotnets 2008 in Calgary, Alberta October 6th and 7th. This paper followed from the discussion and debate around a blog entry that Ken and I did some time back: Diseconomies of scale where we argue that the industry trend towards mega-datacenters needs to be questioned and, in many cases, is simply not cost effective. There are times when Mega-datacenters do makes sense. Very large data analysis jobs and large, multi-server workloads with considerable inter-node communications traffice run best against large central data stores.
Internet-Scale Service Efficiency
Earlier today, I gave a talk at LADIS 2008 (Large Scale Distributed Systems & Middleware) in Yorktown Heights, New York. The program for LADIS is at: http://www.cs.cornell.edu/projects/ladis2008/program.html. The slides presented are posted to: http://mvdirona.com/jrh/TalksAndPapers/JamesRH_Ladis2008.pdf. The quick summary of the talk: Hosted services will be a large part of enterprise information processing and consumer services with economies of scale of 5 to 10x over small scale deployments. Enterprise deployments are very different from high scale services. The former is people-dominated from a cost perspective whereas people-costs are not in the top 4 major factors in the services world. The talk looks at limiting factors in the economic application of resources to services, one of which is power. Looking at power in more detail, we go through where power goes in a modern data center inventorying power disapation ...
Why Blade Servers Aren't the Answer to All Questions
This note describes a conversation I've had multiple times with data center owners and concludes that blade servers frequently don't help and they sometimes hurt, easy data center power utilization improvements are available independent of the blade server premium, and enterprise data center owners have a tendency to buy gadgets from the big suppliers rather than think through overall data center design. We'll dig into each. In talking to data center owners, I've learned a lot but every once in a while I come across a point that just doesn't make sense. My favorite example is server density. I've talked to many DC owners (and I'll bet I'll hear from many after this note) that have just purchased blades servers. The direction of conversation is always the same.
1,000,000 IOPS
IBM just announced achieving over one million Input-output operations per second: IBM Breaks Performance Records Through Systems Innovation. That's an impressive number. To put the achievement in context, a very good (and way too expensive) enterprise disk will deliver somewhere between 180 to just over 200 IOPS. A respectable, but commodity, SATA disk will usually drive somewhere in the 70 to 100 IOPS range. To achieve this goal, IBM actually used a Fusion-IO NAND flash based storage component. It's unfortunate that the original press release from IBM didn't include FusionIO. However, an excellent blog write-up on the performance run by Barry Whyte of IBM offers the details behind the effort: 1M IOPs from Flash - actions speak louder than words. The Fusion-IO press release is at: Fusion-io and IBM Team to Improve Enterprise Storage Performance.
Degraded Operations Mode
In Designing and Deploying Internet Scale Services I've argued that all services should expect to be overloaded and all services should expect mass failures. Very few do and I see related down-time in the news every month or so. The Windows Genuine Advantage failure (WGA Meltdown...) from a year ago is a good example in that the degraded operations modes possible for that service are unusually simple and the problem and causes were well documented. The obvious degraded operations model for WGA is allow users to continue as “WGA Authorized” when the service isn't healthy enough to fully check their O/S authenticity. In the case of WGA, this actually is the intended operation and it is actually designed to do this. This should have worked but services rarely have the good sense to fail. They normally just run very, very ...
Facebook F8 Conference Notes
Facebooks F8 conference was held last month in San Francisco. During his mid-day keynote Mark Zuckerberg reported that the Facebook platform now has 400,000 developers and 90 million users of which 32% are from the United States. The platforms US user population grew 2.4x last year while the international population grew at an astounding 5.1x. Vladimir Fedorov (Windows Live Mesh) attended F8 and brought together this excellent sent of notes on the conference. --jrh Summary: I spent the day on Wednesday at Facebook (F8) conference and talked to some of the companies building facebook applications today.
Scaling at LucasFilms
Kevin Clark, Director of IT Operations at Lucasfilm was interviewed by On-Demand Enterprise in We've Come a Long Way Since Star Wars. His organization owns IT for LucasArts, Lucasfilm, and Industrial Light and Magic. Lucasfilm runs a 4,500 server dedicated rendering farm and they expand this farm with workstations when they are not in use to 5,500 servers in total. The servers are dual socket, dual core Opterons with 32GB of memory. Nothing unusual except the memory configuration is a bit larger than the current average. They have 400TB of storage and produce 10 to 20TB of new and changed data each day.
Geo-Replication at Facebook
Last Friday I arrived back from vacation (Back from the Outside Passage in BC) to 3,600 email messages. I've been slogging through them through the weekend to now and I'm actually starting to catch up. Yesterday Tom Kleinpeter pointed me to this excellent posting from Jason Sobel of Facebook: Scaling Out. This excellent post describes the geo-replication support recently added to Facebook. Rather than having a single west coast data center they added an east coast center both to be near to East Coast users and to provide redundancy for the west coast site.
Blogging Hiatus on Perspectives Until Mid-August
Going boating: http://mvdirona.com/ so I'll be taking a break from blogging until mid-august when I'm back and caught back up. Enjoy, --jrh James Hamilton, Data Center Futures Bldg 99/2428, One Microsoft Way, Redmond, Washington, 98052 W:+1(425)703-9972 | C:+1(206)910-4692 | H:+1(206)201-1859 | [email protected] H:mvdirona.com | W:research.microsoft.com/~jamesrh | blog:http://perspectives.mvdirona.com From Perspectives.
Flickr DB Architecture
I've been collecting scaling stories for some time now and last week I came across the following run down on Fliker scaling: Federation at Flickr: Doing Billions of Queries Per Day by Dathan Vance Pattishall, the Flickr database guy. The Flickr DB Architecture is sharded with a PHP access layer to maintain consistency. Flickr users are randomly assigned to a shard. Each shard is duplicated in another database that is also serving active shards. Each DB needs to be less than 50% loaded to be able to handle failover. Shards are found via a lookup ring that maps userID or groupID to shardID and photoID to userID. The DBs are protected by a memcached layer with a 30 minute caching lifetime.
Foo Camp 2008
I just got back from O'Reilly's Foo Camp. Foo is an interesting conference format in that there is no set agenda. It's basically self organized as a open space-type event but that's not what makes it special. What makes Foo a very cool conference is the people. Lots of conferences invite good people but few invite such a diverse a set of attendees. It was a lot of fun. Here's a picture from Saturday night of (right to left) Jesse Robbins (Seattle entrepreneur and co-chair of O'Reilly's Velocity conference), Pat Helland (Microsoft Developer Division Architect) and myself.
Facebook Releases Cassandra as Open Source
Last week the Facebook Data team released Cassandra as open source. Cassandra is an structured store with write ahead logging and indexing. Jeff Hammerbacher, who leads the Facebook Data team described Cassandra as a BigTable data model running on a Dynamo-like infrastructure. < ?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /> Google Code for Cassandra (Apache 2.0 License): http://code.google.com/p/the-cassandra-project/. Avinash Lakshman, Prashant Malik, and Karthik Ranganathan presented at SIGMOD 2008 this year: Cassandra: Structured Storage System over a P2P Network. From the presentation: Cassandra design goals: · High availability · Eventual consistency · Incremental scalability · Optimistic replication · Knobs to “tune” tradeoffs between consistency, durability, and latecy · Low cost of ownership · Minimal administration Write operation: write to arbitrary node in Cassandra cluster, request sent to node owning the data, node ...
Google Megastore
What follows is a guest posting from Phil Bernstein on the Google Megastore presentation by Jonas Karlsson, Philip Zeyliger at SIGMOD 2008: Megastore is a transactional indexed record manager built by Google on top of BigTable. It is rumored to be the store behind Google AppEngine but this was not confirmed (or denied) at the talk. [JRH: I certainly recognize many similarities between the Google IO talk on the AppEngine store (see Under the Covers of the App Engine Datastore in Rough notes from Selected Sessions at Google IO Day 1) and Phil's notes below].
Hadoop Wins TeraSort
Jim Gray proposed the original sort benchmark back in his famous Anon et al paper A Measure of Transaction Processing Power originally published in Datamation April 1, 1985. TeraSort is one of the benchmarks that Jim evolved from this original proposal. < ?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /> TeraSort is essentially a sequential I/O benchmark and the best way to get lots of I/O capacity is to have many servers. The mainframe engineer-a-bigger-bus technique has produced some nice I/O rates but it doesn't scale. There have been some very good designs but, in the end, commodity parts in volume always win.
Fe-NAND Flash: 10x Durability, 30% Programming Voltage, & 2 Additional Feature Reductions
Recently results from two academic researchers in Japan will be significant to the NAND Flash market: http://www.electronicsweekly.com/Articles/Article.aspx?liArticleID=44028&PrinterFriendly=true. Clearly the trip from laboratory to volume production is often longer than the early estimates but these results look important. Back in 2006, Jim Gray argued in Tape is Dead, Disk is Tape, Flash is Disk, & Ram Locality is King that we need a new layer in the storage hierarchy between memory and disk and NAND Flash was an excellent candidate. Early NAND Flash-based SSDs could sustain read rates well beyond 10x of disk random IO rates but the write rates were terrible.
EcoRAM: NOR Flash to Reduce Memory Power Consumption
Updated below with additional implementation details. Last week Spansion made an interesting announcement: EcoRAM, a NOR Flash based storage part in a Dual In-line Memory Module (DIMM) package. NOR Flash technology growth has been fueled by the NOR support for Execute in Place (XIP). Unlike the NAND Flash interface, where entire memory pages need to be shifted into memory to be operated upon, NOR flash is directly addressable. And this direct addressability allows instructions to be read and executed directly from the memory. There is no need to shift pages out one at a time.
Facebook: Needle in a Haystack: Efficient Storage of Billions of Photos
Title: Needle in a Haystack: Efficient Storage of Billions of Photos Speaker: Jason Sobel, Manager of the Facebook, Infrastructure Group) Slides: http://beta.flowgram.com/f/p.html#2qi3k8eicrfgkv An excellent talk that I really enjoyed. I used to lead a much smaller service that also used a lot of NetApp storage and I recognized many of the problems Jason mentioned. Throughout the introductory part of the talk I found myself thinking they need to move to a cheap, directly attached blob store. And that's essentially the topic of remainder of the talk. Jason presented Haystack, the Facebook solution to the problem of a filesystem not working terribly well for their high volume blob storage needs.