January/February 2017

Too Big NOT to Fail

  Pat Helland, Simon Weaver, and Ed Harris

Embrace failure so it doesn't embrace you.

Web-scale infrastructure implies LOTS of servers working together—often tens or hundreds of thousands of servers all working toward the same goal. How can the complexity of these environments be managed? How can commonality and simplicity be introduced?

Failure and Recovery

Research for Practice:
- Tracing and Debugging Distributed Systems;
- Programming by Examples

  Peter Alvaro, Sumit Gulwani

Expert-curated Guides to the Best of CS Research

This installment of Research for Practice covers two exciting topics in distributed systems and programming methodology. First, Peter Alvaro takes us on a tour of recent techniques for debugging some of the largest and most complex systems in the world: modern distributed systems and service-oriented architectures. The techniques Peter surveys can shed light on order amid the chaos of distributed call graphs. Second, Sumit Gulwani illustrates how to program without explicitly writing programs, instead synthesizing programs from examples! The techniques Sumit presents allow systems to "learn" a program representation from illustrative examples, allowing nonprogrammer users to create increasingly nontrivial functions such as spreadsheet macros. Both of these selections are well in line with RfP's goal of accessible, practical research; in fact, both contributors have successfully transferred their own research in each area to production, at Netflix and as part of Microsoft Excel. Readers may also find a use case!

Debugging, Development, Distributed Development, Research for Practice

The Debugging Mindset

  Devon H. O'Dell

Understanding the psychology of learning strategies leads to effective problem-solving skills.

Software developers spend 35-50 percent of their time validating and debugging software. The cost of debugging, testing, and verification is estimated to account for 50-75 percent of the total budget of software development projects, amounting to more than $100 billion annually. While tools, languages, and environments have reduced the time spent on individual debugging tasks, they have not significantly reduced the total time spent debugging, nor the cost of doing so. Therefore, a hyperfocus on elimination of bugs during development is counterproductive; programmers should instead embrace debugging as an exercise in problem solving.


Kode Vicious: Forced Exception-Handling

You can never discount the human element in programming.

Yes, KV also reads "The Morning Paper," although he has to admit that he does not read everything that arrives in his inbox from that list. Of course, the paper you mention piqued my interest, and one of the things you don't point out is that it's actually a study of distributed systems failures. Now, how can we make programming harder? I know! Let's take a problem on a single system and distribute it. Someday I would like to see a paper that tells us if problems in distributed systems increase along with the number of nodes, or the number of interconnections. Being an optimist, I can only imagine that it's N(N + 1) / 2, or worse.

Kode Vicious

MongoDB's JavaScript Fuzzer

  Robert Guo

The fuzzer is for those edge cases that your testing didn't catch.

Fuzzing, or fuzz testing, is a technique for generating randomized, unexpected, and invalid input to a program to trigger untested code paths. Fuzzing was originally developed in the 1980s and has since proven to be effective at ensuring the stability of a wide range of systems, from file systems to distributed clusters to browsers. As people have attempted to make fuzzing more effective, two philosophies have emerged: smart and dumb fuzzing. As the state of the art evolves, the techniques that are used to implement fuzzers are being partitioned into categories, chief among them being generational and mutational. In many popular fuzzing tools, smart fuzzing corresponds to generational techniques, and dumb fuzzing to mutational techniques, but this is not an intrinsic relationship. Indeed, in our case at MongoDB, the situation is precisely reversed.

Databases, QA

The Soft Side of Software
Does Anybody Listen to You?

  Kate Matsudaira

How do you step up from mere contributor to real change-maker?

When you are navigating an organization, it pays to know whom to talk to and how to reach them. Here is a simple guide to sending your ideas up the chain and actually making them stick. It takes three elements: the right people, the right time, and the right way.

The Soft Side of Software

Making Money Using Math

  Erik Meijer

Modern applications are increasingly using probabilistic machine-learned models.

Machine learning, or ML, is all the rage today, and there are good reasons for that. Models created by machine-learning algorithms for problems such as spam filtering, speech and image recognition, language translation, and text understanding have many advantages over code written by human developers. Machine learning, however, is not as magical as it sounds at first. In fact, it is rather analogous to how human developers create code using test-driven development.

Artificial Intelligence

November/December 2016

Pervasive, Dynamic Authentication of Physical Items

  Meng-Day (Mandel) Yu, Srinivas Devadas

The use of silicon PUF circuits

Authentication of physical items is an age-old problem. Common approaches include the use of bar codes, QR codes, holograms, and RFID (radio-frequency identification) tags. Traditional RFID tags and bar codes use a public identifier as a means of authenticating. A public identifier, however, is static: it is the same each time when queried and can be easily copied by an adversary. Holograms can also be viewed as public identifiers: a knowledgeable verifier knows all the attributes to inspect visually. It is difficult to make hologram-based authentication pervasive; a casual verifier does not know all the attributes to look for. Further, to achieve pervasive authentication, it is useful for the authentication modality to be easy to integrate with modern electronic devices (e.g., mobile smartphones) and to be easy for non-experts to use.


Research for Practice:
- Cryptocurrencies, Blockchains, and Smart Contracts;
- Hardware for Deep Learning

  Peter Bailis, Arvind Narayanan, Andrew Miller, and Song Han

Expert-curated Guides to the Best of CS Research

First, Arvind Narayanan and Andrew Miller, co-authors of the increasingly popular open-access Princeton Bitcoin textbook, provide an overview of ongoing research in cryptocurrencies. This is a topic with a long history in the academic literature that has recently come to prominence with the rise of Bitcoin, blockchains, and similar implementations of advanced, decentralized protocols. These developments have captured the public imagination and the eye of the popular press. In the meantime, academics have been busy, delivering new results in maintaining anonymity, ensuring usability, detecting errors, and reasoning about decentralized markets, all through the lens of these modern cryptocurrency systems. It is a pleasure having two academic experts deliver the latest updates from the burgeoning body of academic research on this subject.

Second, Song Han provides an overview of hardware trends related to another long-studied academic problem that has recently seen an explosion in popularity: deep learning. Fueled by large amounts of training data and inexpensive parallel and scale-out compute, deep-learning-model architectures have seen a massive resurgence of interest based on their excellent performance on traditionally difficult tasks such as image recognition. These deep networks are compute-intensive to train and evaluate, and many of the best minds in computer systems (e.g., the team that developed MapReduce) and AI are working to improve them. As a result, Song has provided a fantastic overview of recent advances devoted to using hardware and hardware-aware techniques to compress networks, improve their performance, and reduce their often large amounts of energy consumption.

AI, Networks, Privacy, Research for Practice

Uninitialized Reads

  Robert C. Seacord, NCC Group

Understanding the proposed revisions to the C language

Most developers understand that reading uninitialized variables in C is a defect, but some do it anyway. What happens when you read uninitialized objects is unsettled in the current version of the C standard (C11). Various proposals have been made to resolve these issues in the planned C2X revision of the standard. Consequently, this is a good time to understand existing behaviors as well as proposed revisions to the standard to influence the evolution of the C language. Given that the behavior of uninitialized reads is unsettled in C11, prudence dictates eliminating uninitialized reads from your code.

Programming Languages

Heterogeneous Computing: Here to Stay

  Mohamed Zahran

Hardware and Software Perspectives

Mentions of the buzzword heterogeneous computing have been on the rise in the past few years and will continue to be heard for years to come, because heterogeneous computing is here to stay. What is heterogeneous computing, and why is it becoming the norm? How do we deal with it, from both the software side and the hardware side? This article provides answers to some of these questions and presents different points of view on others.


Time, but Faster

  Theo Schlossnagle

A computing adventure about time through the looking glass

Every once in a while, you find yourself in a rabbit hole, unsure of where you are or what time it might be. This article presents a computing adventure about time through the looking glass.

The first premise was summed up perfectly by the late Douglas Adams in The Hitchhiker's Guide to the Galaxy: "Time is an illusion. Lunchtime doubly so." The concept of time, when colliding with decoupled networks of computers that run at billions of operations per second, is... well, the truth of the matter is that you simply never really know what time it is. That is why Leslie Lamport's seminal paper on Lamport timestamps was so important to the industry, but this article is actually about wall-clock time, or a reasonably useful estimation of it.


Kode Vicious: The Chess Player Who Couldn't Pass the Salt

AI: Soft and hard, weak and strong, narrow and general

The problem inherent in almost all nonspecialist work in AI is that humans actually don't understand intelligence very well in the first place. Now, computer scientists often think they understand intelligence because they have so often been the "smart" kid, but that's got very little to do with understanding what intelligence actually is. In the absence of a clear understanding of how the human brain generates and evaluates ideas, which may or may not be a good basis for the concept of intelligence, we have introduced numerous proxies for intelligence, the first of which is game-playing behavior.

AI, Kode Vicious

Everything Sysadmin:
Are You Load Balancing Wrong?

  Thomas A. Limoncelli

Anyone can use a load balancer. Using them properly is much more difficult.

In today's web-centric, service-centric environments the use of load balancers is widespread. I assert, however, that most of the time they are used incorrectly. To understand the problem, we first need to discuss a little about load balancers in general. Then we can look at the problem and solutions.

Everything Sysadmin, System Administration

September/October 2016

Life Beyond Distributed Transactions

  Pat Helland

An apostate's opinion

Transactions are amazingly powerful mechanisms, and I've spent the majority of my almost 40-year career working on them. In 1982, I first worked to provide transactions on the Tandem NonStop System. This system had a mean time between failures measured in years4 and included a geographically distributed two-phase commit offering excellent availability for strongly consistent transactions.

New innovations, including Google's Spanner,2 offer strongly consistent transactional environments at extremely large scale with excellent availability. Building distributed transactions to support highly available applications is a great challenge that has inspired excellent innovation and great technology. Unfortunately, this is not broadly available to application developers.

Distributed Computing

Research for Practice:
Distributed Transactions and Networks as Physical Sensors

  Peter Bailis, Irene Zhang, Fadel Adib

Expert-curated Guides to the Best of CS Research

First, Irene Zhang delivers a whirlwind tour of recent developments in distributed concurrency control. If you thought distributed transactions were prohibitively expensive, Irene's selections may prompt you to reconsider: the use of atomic clocks, clever replication protocols, and new means of commit ordering all improve performance at scale.

Second, Fadel Adib provides a fascinating look at using computer networks as physical sensors. It turns out that the radio waves passing through our environment and bodies are subtly modulated as they do so. As Fadel's selection shows, new techniques for sensing and interpreting these modulations allow us to perform tasks previously reserved for science fiction: seeing through walls, performing gesture recognition, and monitoring breathing.

Distributed Computing, Networks, Research for Practice

BBR: Congestion-Based Congestion Control

  Neal Cardwell, Yuchung Cheng, C. Stephen Gunn, Soheil Hassas Yeganeh, Van Jacobson

Measuring bottleneck bandwidth and round-trip propagation time

By all accounts, today's Internet is not moving data as well as it should. Most of the world's cellular users experience delays of seconds to minutes; public wifi in airports and conference venues is often worse. Physics and climate researchers need to exchange petabytes of data with global collaborators but find their carefully engineered multi-Gbps infrastructure often delivers at only a few Mbps over intercontinental distances.

Today TCP's loss-based congestion control is the primary cause of these problems. When bottleneck buffers are large, loss-based congestion control keeps them full, causing bufferbloat. When bottleneck buffers are small, loss-based congestion control misinterprets loss as a signal of congestion, leading to low throughput. Fixing these problems requires an alternative to loss-based congestion control. Finding this alternative requires an understanding of where and how network congestion originates.


The Soft Side of Software
Resolving Conflict

  Kate Matsudaira

Don't "win." Resolve.

In a perfect world, we would all get along with our coworkers and bosses all the time. Unfortunately, we don't live in a perfect world. While most of us make our best efforts to avoid conflict at work, occasionally it is unavoidable. Here are some of my best tips on how to make all of your conflicts in the workplace healthy and (hopefully) productive, so you can move on and get back to what really matters.

The Soft Side of Software

Faucet: Deploying SDN in the Enterprise

  Josh Bailey and Stephen Stuart

Using OpenFlow and DevOps for rapid development

Faucet was built on the OpenFlow 1.3 standard. Without the availability of commercial hardware supporting this standard, it would not have been possible. Multiple vendors now ship hardware that supports OpenFlow 1.3, specifically with support for multiple flow tables and IPv6. To minimize vendor-specific logic in the controller, vendors were encouraged to support key features in the OpenFlow 1.3 standard in a consistent way. This reduced initial development and support cost, and it simplified bug reporting and automated testing.

While SDN as a technology continues to evolve and become even more programmable, Faucet and OpenFlow 1.3 hardware together are sufficient to realize benefits today. This article describes specifically how to take advantage of DevOps practices to develop and deploy features rapidly. It also describes several practical deployment scenarios, including firewalling and network function virtualization.


Kode Vicious: The Unholy Trinity of Software Development

Tests, documentation, and code

Software developers like new toys. Of course they do: they work on computers and computers are toys to us, and everyone likes things that are shiny. If you visit a modern software company, what do you see besides a sea of Aeron chairs? Lots and lots of monitors, and many of those are of the 4-K variety, meaning that a text editor, even with a large font, will give you more than 100 lines of code to look at—a 400 percent increase over the 80x25 monitors used to write code since the 1970s.

Kode Vicious

Industrial Scale Agile - from Craft to Engineering

  Ivar Jacobson, Ian Spence, and Ed Seidewitz

Essence is instrumental in moving software development toward a true engineering discipline.

There are many, many ways to illustrate how fragile IT investments can be. You just have to look at the way that, even after huge investments in education and coaching, many organizations are struggling to broaden their agile adoption to the whole of their organization—or at the way other organizations are struggling to maintain the momentum of their agile adoptions as their teams change and their systems mature.


July/August 2016

Research for Practice:
Web Security and Mobile Web Computing

  Peter Bailis, Jean Yang, Vijay Janapa Reddi, and Yuhao Zhu

Expert-curated Guides to the Best of CS Research

First, Jean Yang provides an overview of how to use information flow techniques to build programs that are secure by construction. Second, Vijay Janapa Reddi and Yuhao Zhu provide an overview of the challenges for the future of the mobile web.

Mobile Computing, Research for Practice, Web Development, Web Security

Escaping the Singularity
The Power of Babble

  Pat Helland

Expect to be constantly and pleasantly befuddled

Metadata defines the shape, the form, and how to understand our data. It is following the trend taken by natural languages in our increasingly interconnected world. While many concepts can be communicated using shared metadata, no one can keep up with the number of disparate new concepts needed to have a common understanding.

Escaping the Singularity

Functional at Scale

  Marius Eriksen

Applying functional programming principles to distributed computing projects

Modern server software is demanding to develop and operate: it must be available at all times and in all locations; it must reply within milliseconds to user requests; it must respond quickly to capacity demands; it must process a lot of data and even more traffic; it must adapt quickly to changing product needs; and in many cases it must accommodate a large engineering organization, its many engineers the proverbial cooks in a big, messy kitchen.

Distributed Computing, Distributed Development, Programming Languages

The Soft Side of Software
Fresh Starts

  Kate Matsudaira

Just because you have been doing it the same way doesn't mean you are doing it the right way.

Wouldn't it be great if you frequently were in a position where you were pushed to grow outside of your comfort zone? Where you had to start new and fresh?

The Soft Side of Software

Case Study
React: Facebook's Functional Turn on Writing JavaScript

A discussion with Pete Hunt, Paul O'Shannessy, Dave Smith and Terry Coatta

One of the long-standing ironies of user-friendly JavaScript front ends is that building them typically involved trudging through the DOM (Document Object Model), hardly known for its friendliness to developers. But now developers have a way to avoid directly interacting with the DOM, thanks to Facebook's decision to open-source its React library for the construction of user interface components.

Programming Languages, Web Development, Web Services

Scaling Synchronization in Multicore Programs

  Adam Morrison

Advanced synchronization methods can boost the performance of multicore software.

Designing software for modern multicore processors poses a dilemma. Traditional software designs, in which threads manipulate shared data, have limited scalability because synchronization of updates to shared data serializes threads and limits parallelism. Alternative distributed software designs, in which threads do not share mutable data, eliminate synchronization and offer better scalability. But distributed designs make it challenging to implement features that shared data structures naturally provide, such as dynamic load balancing and strong consistency guarantees, and are simply not a good fit for every program.

Often, however, the performance of shared mutable data structures is limited by the synchronization methods in use today, whether lock-based or lock-free. To help readers make informed design decisions, this article describes advanced (and practical) synchronization methods that can push the performance of designs using shared mutable data to levels that are acceptable to many applications.

Concurrency, Performance

Everything Sysadmin:
10 Optimizations on Linear Search

  Thomas A. Limoncelli

The operations side of the story

System administrators (DevOps engineers or SREs or whatever your title) must deal with the operational aspects of computation, not just the theoretical aspects. Operations is where the rubber hits the road. As a result, operations people see things from a different perspective and can realize opportunities outside of the basic O() analysis. Let's look at the operational aspects of the problem of trying to improve something that is theoretically optimal already.

Everything Sysadmin, Search Engines, System Administration

