Side Effects

Vol. 15 No. 2 – March-April 2017

Side Effects

Data Sketching:
The approximate approach is often faster and more efficient.

Do you ever feel overwhelmed by an unending stream of information? It can seem like a barrage of new email and text messages demands constant attention, and there are also phone calls to pick up, articles to read, and knocks on the door to answer. Putting these pieces together to keep track of what’s important can be a real challenge. In response to this challenge, the model of streaming data processing has grown in popularity. The aim is no longer to capture, store, and index every minute event, but rather to process each observation quickly in order to create a summary of the current state. Following its processing, an event is dropped and is no longer accessible. The summary that is retained is often referred to as a sketch of the data. This article introduces the ideas behind sketching, with a focus on algorithmic innovations. It describes some algorithmic developments in the abstract, followed by the steps needed to put them into practice, with examples. The article also looks at four novel algorithmic ideas and discusses some emerging areas.

by Graham Cormode

The Calculus of Service Availability:
You’re only as available as the sum of your dependencies.

Most services offered by Google aim to offer 99.99 percent (sometimes referred to as the "four 9s") availability to users. Some services contractually commit to a lower figure externally but set a 99.99 percent target internally. This more stringent target accounts for situations in which users become unhappy with service performance well before a contract violation occurs, as the number one aim of an SRE team is to keep users happy. For many services, a 99.99 percent internal target represents the sweet spot that balances cost, complexity, and availability. For some services, notably global cloud services, the internal target is 99.999 percent.

by Benjamin Treynor Sloss, Mike Dahlin, Vivek Rau, Betsy Beyer

The Observer Effect:
Finding the balance between zero and maximum

The problem is a failure to appreciate just what you are asking a system to do when polling it for information. Modern systems contain thousands of values that can be measured and recorded. Blindly retrieving whatever it is that might be exposed by the system is bad enough, but asking for it with a high-frequency poll is much worse.

by George Neville-Neil

Research for Practice: Technology for UnderservedCommunities; Personal Fabrication:
Expert-curated Guides to the Best of CS Research

This installment of Research for Practice provides curated reading guides to technology for underserved communities and to new developments in personal fabrication. First, Tawanna Dillahunt describes design considerations and technology for underserved and impoverished communities. Designing for the more than 1.6 billion impoverished individuals worldwide requires special consideration of community needs, constraints, and context. Tawanna’s selections span protocols for poor-quality communication networks, community-driven content generation, and resource and public service discovery. Second, Stefanie Mueller and Patrick Baudisch provide an overview of recent advances in personal fabrication (e.g., 3D printers). Their selection covers new techniques for fabricating (and emulating) complex materials (e.g., by manipulating the internal structure of an object), for more easily specifying object shape and behavior, and for human-in-the-loop rapid prototyping. Combined, these two guides provide a fascinating deep dive into some of the latest human-centric computer science research results.

by Tawanna Dillahunt, Stefanie Mueller, Patrick Baudisch

The IDAR Graph:
An improvement over UML

UML is the de facto standard for representing object-oriented designs. It does a fine job of recording designs, but it has a severe problem: its diagrams don’t convey what humans need to know, making them hard to understand. This is why most software developers use UML only when forced to. People understand an organization, such as a corporation, in terms of a control hierarchy. When faced with an organization of people or objects, the first question usually is, "What’s controlling all this?" Surprisingly, UML has no concept of one object controlling another. Consequently, in every type of UML diagram, no object appears to have greater or lesser control than its neighbors. These problems mean designs tend to become messy during both initial implementation and maintenance, resulting in more bugs and delays.

by Mark A. Overton

Side Effects, Front and Center!:
One System’s Side Effect is Another’s Meat and Potatoes.

We think of computation in terms of its consequences. The big MapReduce job returns a large result. Web interactions display information. Enterprise applications update the database and return an answer. These are the reasons we do our work. What we rarely discuss are the side effects of doing the work we intend. Side effects may be unwanted, or they may actually cause desired behavior at different layers of the system. This column points out some fun patterns to keep in mind as we build and use our systems.

by Pat Helland

Conversations with Technology Leaders: Erik Meijer:
Great engineers are able to maximize their mental power.

Whether you are a leader, a programmer, or just someone aspiring to be better, I am sure there are some smart takeaways from our conversation that will help you grow in your role. Oh, and if you read to the end, you can find out what his favorite job interview question is - and see if you would be able to pass his test.

by Kate Matsudaira