November/December 2019 issue of acmqueue The November/December 2019 issue of acmqueue is out now

Subscribers and ACM Professional members login here

November/December 2019

Special issue on the critical role of human perception in software

Revealing the Critical Role of Human Performance in Software

  David D. Woods, John Allspaw

It's time to revise our appreciation of the human side of Internet-facing software systems.

Understanding, supporting, and sustaining the capabilities above the line of representation require all stakeholders to be able to continuously update and revise their models of how the system is messy and yet usually manages to work. This kind of openness to continually reexamine how the system really works requires expanding the efforts to learn from incidents.


Above the Line, Below the Line

  Richard I. Cook, M.D.

The resilience of Internet-facing systems relies on what is below the line of representation.

Knowledge and understanding of below-the-line structure and function are continuously in flux. Near-constant effort is required to calibrate and refresh the understanding of the workings, dependencies, limitations, and capabilities of what is present there. In this dynamic situation no individual or group can ever know the system state. Instead, individuals and groups must be content with partial, fragmented mental models that require more or less constant updating and adjustment if they are to be useful.

Development, Web Services

Cognitive Work of Hypothesis Exploration During Anomaly Response

  Marisa R. Grayson

A look at how we respond to the unexpected

Four incidents from web-based software companies reveal important aspects of anomaly response processes when incidents arise in web operations, two of which are discussed in this article. One particular cognitive function examined in detail is hypothesis generation and exploration, given the impact of obscure automation on engineers' development of coherent models of the systems they manage. Each case was analyzed using the techniques and concepts of cognitive systems engineering. The set of cases provides a window into the cognitive work "above the line" in incident management of complex web-operation systems.


Managing the Hidden Costs of Coordination

  Laura M.D. Maguire

Controlling coordination costs when multiple, distributed perspectives are essential

Some initial considerations to control cognitive costs for incident responders include: (1) assessing coordination strategies relative to the cognitive demands of the incident; (2) recognizing when adaptations represent a tension between multiple competing demands (coordination and cognitive work) and seeking to understand them better rather than unilaterally eliminating them; (3) widening the lens to study the joint cognition system (integration of human-machine capabilities) as the unit of analysis; and (4) viewing joint activity as an opportunity for enabling reciprocity across inter- and intra-organizational boundaries.

Debugging, Development

Beyond the "Fix-it" Treadmill

  J. Paul Reed

The Use of Post-Incident Artifacts in High-Performing Organizations

Given that humanity's study of the sociological factors in safety is almost a century old, the technology industry's post-incident analysis practices and how we create and use the artifacts those practices produce are all still in their infancy. So don't be surprised that many of these practices are so similar, that the cognitive and social models used to parse apart and understand incidents and outages are few and cemented in the operational ethos, and that the byproducts sought from post-incident analyses are far-and-away focused on remediation items and prevention.

Development, Quality Assurance

September/October 2019

Kode Vicious
Numbers Are for Computers, Strings Are for Humans

How and where software should translate data into a human-readable form

Unless what you are processing, storing, or transmitting are, quite literally, strings that come from and are meant to be shown to humans, you should avoid processing, storing, or transmitting that data as strings. Remember, numbers are for computers, strings are for humans. Let the computer do the work of presenting your data to the humans in a form they might find palatable. That's where those extra bytes and instructions should be spent, not doing the inverse.

Data and Databases, Kode Vicious

Commit to Memory:
Opening up the Baseboard Management Controller

  Jessie Frazelle

If the CPU is the brain of the board, the BMC is the brain stem.

In 2011 Facebook announced the Open Compute Project to form a community around open-source designs and specifications for data center hardware. Facebook and other hyperscalers provide their solutions to the problems that come with running data centers at scale. Since then, the project has expanded to all aspects of the open data center, including baseboard management controllers (BMCs), among many others. In this column, I focus on the BMC. It's an introduction to a complicated topic; some sections just touch the surface, but the intention is to provide a full picture of the world of the open-source BMC ecosystem, starting with a brief overview of the BMC's role in a system, touching on security concerns around the BMC, and then diving into some of the projects that have developed in the open-source ecosystem.

Open Source

Blockchain Technology: What Is It Good for?

  Scott Ruoti, Ben Kaiser, Arkady Yerukhimovich, Jeremy Clark, and Robert Cunningham

Industry's dreams and fears for this new technology

Business executives, government leaders, investors, and researchers frequently ask the following three questions: (1) What exactly is blockchain technology? (2) What capabilities does it provide? (3) What are good applications?

The goal of this article is to answer these questions thoroughly, provide a holistic overview of blockchain technology that separates hype from reality, and propose a useful lexicon for discussing the specifics of blockchain technology in the future.


Everything Sysadmin:
API Practices If You Hate Your Customers

  Thomas A. Limoncelli

APIs speak louder than words.

Do you have disdain for your customers? Do you wish they would go away? When you interact with customers are you silently fantasizing about them switching to your competitor's product? In short, do you hate your customers? In this article, I document a number of industry best practices designed to show customers how much you hate them. All of them are easy to implement. Heck, your company may be doing many of these already.

Business and Management, Everything Sysadmin

The Reliability of Enterprise Applications

  Sanjay Sha

Understanding enterprise reliability

This article describes a core set of principles and engineering methodologies that enterprises can apply to help them navigate the complex environment of enterprise reliability and deliver highly reliable and cost-efficient applications.

Quality Assurance

Escaping the Singularity
Space Time Discontinuum

  Pat Helland

Combining data from many sources may cause painful delays.

Back when you had only one database for an application to worry about, you didn't have to think about partial results. You also didn't have to think about data arriving after some other data. It was all simply there. Now, you can do so much more with big distributed systems, but you have to be more sophisticated in the tradeoff between timely answers and complete answers.

Data and Databases, Escaping the Singularity

Optimizations in C++ Compilers

  Matt Godbolt

A practical journey

There's a tradeoff to be made in giving the compiler more information: it can make compilation slower. Technologies such as link time optimization can give you the best of both worlds. Optimizations in compilers continue to improve, and upcoming improvements in indirect calls and virtual function dispatch might soon lead to even faster polymorphism.


The Morning Paper:
Back under a SQL Umbrella

  Adrian Colyer

Unifying serving and analytical data; using a database for distributed machine learning

Procella is the latest in a long line of data processing systems at Google. What's unique about it is that it's a single store handling reporting, embedded statistics, time series, and ad-hoc analysis workloads under one roof. It's SQL on top, cloud-native underneath, and it's serving billions of queries per day over tens of petabytes of data. There's one big data use case that Procella isn't handling today though, and that's machine learning. But in 'Declarative recursive computation on an RDBMS... or, why you should use a database for distributed machine learning,' Jankov et al. make the case for the database being the ideal place to handle the most demanding of distributed machine learning workloads.

Data and Databases


July/August 2019

The Morning Paper:
Putting Machine Learning into Production Systems

  Adrian Colyer

Data validation and software engineering for machine learning

In "Data Validation for Machine Learning," Breck et al. share details of the pipelines used at Google to validate petabytes of production data every day. With so many moving parts it's important to be able to detect and investigate changes in data distributions before they can impact model performance.

"Software Engineering for Machine Learning: A Case Study" shares lessons learned at Microsoft as machine learning started to pervade more and more of the company's systems, moving from specialized machine-learning products to simply being an integral part of many products and services.


Hack for Hire

  Ariana Mirian

Investigating the emerging black market of retail email account hacking services

While targeted attacks are often thought of as requiring nation-state capabilities, there is an emerging black market for "hack-for-hire" services, which provide targeted attacks to anyone willing to pay a modest fee. These services purport to be able to break into the accounts of a variety of different email providers. As these services are just emerging, little is known about how they attack their victims and how much of a risk they pose.

Regardless of the behavior of the market, this study sheds light on the importance of security keys for populations who believe they are at risk, as only a security key can protect a user from the attacks viewed in this study. As the market evolves and defenses change, however, attacks might also change and shift from phishing to more persistent threats such as malware.

Privacy and Rights, Security

Escaping the Singularity
Write Amplification Versus Read Perspiration

  Pat Helland

The tradeoffs between write and read

In computing, there's an interesting trend where writing creates a need to do more work. You need to reorganize, merge, reindex, and more to make the stuff you wrote more useful. If you don't, you must search or do other work to support future reads.

Data and Databases, Escaping the Singularity

The Effects of Mixing Machine Learning and Human Judgment

  Michelle Vaccaro and Jim Waldo

Collaboration between humans and machines does not necessarily lead to better outcomes.

Based on the theoretical findings from the existing literature, some policymakers and software engineers contend that algorithmic risk assessments such as the COMPAS software can alleviate the incarceration epidemic and the occurrence of violent crimes by informing and improving decisions about policing, treatment, and sentencing.

Considered in tandem, these findings indicate that collaboration between humans and machines does not necessarily lead to better outcomes, and human supervision does not sufficiently address problems when algorithms err or demonstrate concerning biases. If machines are to improve outcomes in the criminal justice system and beyond, future research must further investigate their practical role: an input to human decision makers.

Artificial Intelligence

Kode Vicious
Koding Academies

A low-risk path to becoming a front-end plumber

Encourage your friend to pick a course that will introduce concepts that can be used into the future, rather than just a specific set of buzzword technologies that are hot this year. Most courses are based around Python. Encourage your friend to study that as a first computer language, as the concepts learned in Python can be applied in other languages and other fields. And make sure to be very direct in explaining that the certificate effectively makes its holder a front-end plumber, able to unclog the series of pipes that run between businesses and consumers' wallets, and that becoming a software engineer will take quite a bit more study and practice.

Education, Kode Vicious

Persistent Memory Programming on Conventional Hardware

  Terence Kelly

The persistent memory style of programming can dramatically simplify application software.

Driven by the advent of byte-addressable non-volatile memory, the persistent memory style of programming will gain traction among developers, taking its rightful place alongside existing paradigms for managing persistent application state. Until NVM becomes available on all computers, developers can use the techniques presented in this article to enjoy the benefits of persistent memory programming on conventional hardware.



May/June 2019

Case Study
DAML: The Contract Language of Distributed Ledgers

A discussion between Shaul Kfir and Camille Fournier

The how and why of Digital Asset's own distributed-ledger technology, DAML (Digital Asset Modeling Language).

Case Studies, Networks

Kode Vicious
What is a CSO Good For?

Security requires more than an off-the-shelf solution.

The CSO is not a security engineer, so let's contrast the two jobs to create a picture of what we should and should not see.

Business and Management, Kode Vicious, Security

Everything Sysadmin:
Demo Data as Code

  Thomas A. Limoncelli

Automation helps collaboration.

A casual request for a demo dataset may seem like a one-time thing that doesn't need to be automated, but the reality is that this is a collaborative process requiring multiple iterations and experimentation. There will undoubtedly be requests for revisions big and small, the need to match changing software, and to support new and revised demo stories. All of this makes automating the process worthwhile. Modern scripting languages make it easy to create ad hoc functions that act like a little language. A repeatable process helps collaboration, enables delegation, and saves time now and in the future.

Business and Management, Everything Sysadmin

Velocity in Software Engineering

  Tom Killalea

From tectonic plate to F-16

Software engineering occupies an increasingly critical role in companies across all sectors, but too many software initiatives end up both off target and over budget. A surer path is optimized for speed, open to experimentation and learning, agile, and subject to regular course correcting. Good ideas tend to be abundant, though execution at high velocity is elusive. The good news is that velocity is controllable; companies can invest systematically to increase it.


The Soft Side of Software:
The Evolution of Management

  Kate Matsudaira

Transitioning up the ladder

With each step up, the job changes - but not all of the changes are obvious. You have to shift your mindset, and focus on building new skills that are often very different from the skills that made you successful in your previous role.

Business and Management, The Soft Side of Software

Open-source Firmware

  Jessie Frazelle

Step into the world behind the kernel.

Open-source firmware can help bring computing to a more secure place by making the actions of firmware more visible and less likely to do harm. This article's goal is to make readers feel empowered to demand more from vendors who can help drive this change.

Open Source

The Morning Paper:
Time Protection in Operating Systems;
Speaker Legitimacy Detection

  Adrian Colyer

Operating system-based protection from timing-based side-channel attacks;
implications of voice-imitation software

Timing-based side-channel attacks are a particularly tricky class of attacks to deal with because the very thing you're often striving for can give you away. There are always more creative new instances of attacks to be found, so you need a principled way of thinking about defenses that address the class, not just a particular instantiation. That's what Ge et al. give us in "Time Protection, the Missing OS Abstraction." Just as operating systems prevent spatial inference through memory protection, so future operating systems will need to prevent temporal inference through time protection. It's going to be a long road to get there.

The second paper chosen for this edition comes from NDSS'19 (Network and Distributed System Security Symposium) and studies the physiological and social implications of the ever-improving abilities of voice-imitation software. It seems people may be especially vulnerable to being fooled by fake voices. "The crux of voice (in)security: a brain study of speaker legitimacy detection," by Neupane et al., is a fascinating study with implications far beyond just the technology.

Networks, Security


March/April 2019

Surviving Software Dependencies

  Russ Cox

Software reuse is finally here but comes with risks.

Software reuse is finally here, and its benefits should not be understated, but we've accepted this transformation without completely thinking through the potential consequences. The Copay and Equifax attacks are clear warnings of real problems in the way software dependencies are consumed today. There's a lot of good software out there. Let's work together to find out how to reuse it safely.


Kode Vicious

On writing documentation

Pronouncements without background or explanatory material are useless to those who are not also deeply steeped in the art and science of computer security or security in general. It takes a particular bend of mind to think like an attacker and a defender all at once, and most people are incapable of doing this; so, if you want the people reading the document to follow your guidance, then you must take them on a journey from ignorance to knowledge.

Kode Vicious

Escaping the Singularity
Extract, Shoehorn, and Load

  Pat Helland

Data doesn't always fit nicely into a new home.

It turns out that the business value of ill-fitting data is extremely high. The process of taking the input data, discarding what doesn't fit, adding default or null values for missing stuff, and generally shoehorning it to the prescribed shape is important. The prescribed shape is usually one that is amenable to analysis for deeper meaning.

Data and Databases, Escaping the Singularity

Case Study
Access Controls and Health Care Records: Who Owns the Data?

A discussion with David Evans, Richard McDonald, and Terry Coatta

What if health care records were handled in more of a patient-centric manner, using systems and networks that allow data to be readily shared by all the physicians, clinics, hospitals, and pharmacies a person might choose to share them with or have occasion to visit? And, more radically, what if it was the patients who owned the data?

Case Studies, Data and Databases, Privacy and Rights

Research for Practice:
The DevOps Phenomenon

  Anna Wiedemann, Nicole Forsgren, Manuel Wiesche, Heiko Gewald, and Helmut Krcmar

An executive crash course

Stressful emergency releases are a thing of the past for companies that subscribe to the DevOps method of software development and delivery. New releases are frequent. Bugs are fixed rapidly. New business opportunities are sought with gusto and confidence. New features are released, revised, and improved with rapid iterations.

DevOps is about providing guidelines for faster time to market of new software features and achieving a higher level of stability. Implementing cross-functional, product-oriented teams helps bridge the gaps between software development and operations. By ensuring their transformations include all of the principles outlined in CALMS, teams can achieve superior performance and deliver value to their organizations.

DevOps is often challenging, but stories from across the industry show that many organizations have already overcome the early hurdles and plan to continue their progress, citing the value to their organizations and the benefits to their engineers.

Business and Management, Research for Practice

The Soft Side of Software
Overly Attached

  Kate Matsudaira

Know when to let go of emotional attachment to your work.

A smart, senior engineer couldn't make logical decisions if it meant deprecating the system he and his team had worked on for a number of years. Even though the best thing would have been to help another team create the replacement system, they didn't want to entertain the idea because it would mean putting an end to something they had invested so much in. It is good to have strong ownership, but what happens when you get too attached?

Business and Management, The Soft Side of Software

Industry-scale Knowledge Graphs: Lessons and Challenges

  Natasha Noy, Yuqing Gao, Anshu Jain, Anant Narayanan, Alan Patterson, Jamie Taylor

Five diverse technology companies show how it's done

This article looks at the knowledge graphs of five diverse tech companies, comparing the similarities and differences in their respective experiences of building and using the graphs, and discussing the challenges that all knowledge-driven enterprises face today.

The collection of knowledge graphs discussed here covers the breadth of applications, from search, to product descriptions, to social networks.

The goal here is not to describe these knowledge graphs exhaustively, but rather to use the authors' practical experiences in building knowledge graphs in some of the largest technology companies today as a scaffolding to highlight the challenges that any enterprise-scale knowledge graph will face and where some innovative research is needed.

Data and Databases, Development, Networks

The Morning Paper:
GAN Dissection and Datacenter RPCs

  Adrian Colyer

Visualizing and understanding generative adversarial networks;
datacenter RPCs can be general and fast.

Image generation using GANs (generative adversarial networks) has made astonishing progress over the past few years. While staring in wonder at some of the incredible images, it's natural to ask how such feats are possible. "GAN Dissection: Visualizing and Understanding Generative Adversarial Networks" gives us a look under the hood to see what kinds of things are being learned by GAN units, and how manipulating those units can affect the generated images.

February saw the 16th edition of the Usenix Symposium on Networked Systems Design and Implementation. Kalia et al. blew me away with their work on fast RPCs (remote procedure calls) in the datacenter. Through a carefully considered design, they show that RPC performance with commodity CPUs and standard lossy Ethernet can be competitive with specialized systems based on FPGAs (field-programmable gate arrays), programmable switches, and RDMA (remote direct memory access). It's a fabulous reminder to ensure we're making the most of what we already have before leaping to more expensive solutions.

Data and Databases, Development, Networks


January/February 2019

Research for Practice:
Troubling Trends in Machine Learning Scholarship

  Zachary C. Lipton, Jacob Steinhardt

Some ML papers suffer from flaws that could mislead the public and stymie future research.

Flawed scholarship threatens to mislead the public and stymie future research by compromising ML's intellectual foundations. Indeed, many of these problems have recurred cyclically throughout the history of AI and, more broadly, in scientific research. In 1976, Drew McDermott chastised the AI community for abandoning self-discipline, warning prophetically that "if we can't criticize ourselves, someone else will save us the trouble." The current strength of machine learning owes to a large body of rigorous research to date, both theoretical and empirical. By promoting clear scientific thinking and communication, our community can sustain the trust and investment it currently enjoys.

Artificial Intelligence, Research for Practice

Everything Sysadmin
Tom's Top Ten Things Executives Should Know About Software

  Thomas A. Limoncelli

Software acumen is the new norm.

Software is eating the world. To do their jobs well, executives and managers outside of technology will benefit from understanding some fundamentals of software and the software-delivery process.

Business and Management, Everything Sysadmin

Garbage Collection as a Joint Venture

  Ulan Degenbaev, Michael Lippautz, Hannes Payer

A collaborative approach to reclaiming memory in heterogeneous software systems

Cross-component tracing is a way to solve the problem of reference cycles across component boundaries. This problem appears as soon as components can form arbitrary object graphs with nontrivial ownership across API boundaries. An incremental version of CCT is implemented in V8 and Blink, enabling effective and efficient reclamation of memory in a safe manner.


The Soft Side of Software
How to Create a Great Team Culture (and Why It Matters)

  Kate Matsudaira

Build safety, share vulnerability, and establish purpose.

As leader of the team, you have significant influence over your team's culture. You can institute policies and procedures that help make your team happy and productive, monitor team successes, and continually improve the team. Another important part of team culture, however, is helping people feel they are a part of creating it. How can you expand the job of creating a culture to other team members?

Business and Management, The Soft Side of Software

Online Event Processing

  Martin Kleppmann, Alastair R. Beresford, Boerge Svingen

Achieving consistency where distributed transactions have failed

Support for distributed transactions across heterogeneous storage technologies is either nonexistent or suffers from poor operational and performance characteristics. In contrast, OLEP is increasingly used to provide good performance and strong consistency guarantees in such settings. In data systems it is very common for logs to be used as internal implementation details. The OLEP approach is different: it uses event logs, rather than transactions, as the primary application programming model for data management. Traditional databases are still used, but their writes come from a log rather than directly from the application. The use of OLEP is not simply pragmatism on the part of developers, but rather it offers a number of advantages. Consequently, OLEP is expected to be increasingly used to provide strong consistency in large-scale systems that use heterogeneous storage technologies.

Distributed Development

Kode Vicious
The Worst Idea of All Time

Revelations at 100!

So, is the author behind Kode Vicious really a big, loud jerk who throws coworkers out windows, flattens the tires of the annoying marketing guy, drinks heavily, and beats and berates his colleagues? The answer is both yes and no.

Kode Vicious

Net Neutrality: Unexpected Solution to Blockchain Scaling

  Aleksandar Kuzmanovic

Cloud-delivery networks could dramatically improve blockchains' scalability, but clouds must be provably neutral first.

There is a growing expectation, or at least a hope, that blockchains possess a disruptive potential in numerous domains because of their decentralized nature (i.e., no single entity controls their operations). Decentralization comes with a price, however: blockchains do not scale. Provably neutral clouds are undoubtedly a viable solution to blockchain scaling. By optimizing the transport layer, not only can the throughput be fundamentally scaled up, but the latency could be dramatically reduced. The key to this vision, however, lies in establishing trust by the blockchain ecosystem into the underlying networking infrastructure. This, in turn, is achieved by decoupling authority from infrastructure via a provably neutral network design.


The Morning Paper:
SageDB and NetAccel

  Adrian Colyer

Learned models within the database system; network-accelerated query processing

The CIDR (Conference on Innovative Data Systems Research) runs once every two years, and luckily for us 2019 is one of those years. I've selected two papers from this year's conference that highlight bold and exciting directions for data systems.



November/December 2018

Identity by Any Other Name

  Pat Helland

The complex cacophony of intertwined systems

As distributed systems scale in size and heterogeneity, increasingly they are connected by identifiers. Frequently, these terms refer to immutable things. At other times, they refer to stuff that changes as time goes on. Identifiers are even used to represent the nature of the computation working across distrusting systems. Identity and identifiers provide the immutable linkage. Both sides of this linkage may change, but they provide a semantic consistency needed by the business operation. No matter what you call it, identity is the glue that makes things stick and lubricates cooperative work.

Data and Databases, Distributed Computing

Research for Practice:
Edge Computing

  Nitesh Mor

Scaling resources within multiple administrative domains

Cloud computing taught practitioners how to scale resources within a single administrative domain. Edge computing requires learning how to scale in the many administrative domains.

Creating edge computing infrastructures and applications encompasses quite a breadth of systems research. Let's take a look at the academic view of edge computing and a sample of existing research that will be relevant in the coming years.

Data and Databases, Distributed Computing, Research for Practice

Achieving Digital Permanence

  Raymond Blum, Betsy Beyer

The many challenges to maintaining stored information and ways to overcome them

Today's Information Age is creating new uses for and new ways to steward the data that the world depends on. The world is moving away from familiar, physical artifacts to new means of representation that are closer to information in its essence. We need processes to ensure both the integrity and accessibility of knowledge in order to guarantee that history will be known and true.

Data and Databases, Web Services

Kode Vicious
Know Your Algorithms

Stop using hardware to solve software problems.

Knowing that your CPU is in use 100 percent of the time doesn't tell you much about the overall system other than it's busy, but busy with what? Maybe it's sitting in a tight loop, or some clown added a bunch of delay loops during testing that are no longer necessary. Until you profile your system, you have no idea why the CPU is busy. All systems provide some form of profiling so that you can track down where the bottlenecks are, and it's your responsibility to apply these tools before you spend money on brand new hardware.

Development, Kode Vicious

Metrics That Matter

  Benjamin Treynor Sloss, Shylaja Nukala, and Vivek Rau

Critical but oft-neglected service metrics that every SRE and product owner should care about

Measure your site reliability metrics, set the right targets, and go through the work to measure the metrics accurately. Then, you'll find that your service runs better, with fewer outages, and much more user adoption.

Web Services

The Soft Side of Software
Design Patterns for Managing Up

  Kate Matsudaira

Four challenging work situations and how to handle them

Have you ever been in a situation where you are presenting to your manager or your manager's manager and you completely flub the opportunity by saying all the wrong things? Look for patterns and be the version of yourself that you want to be. When you have a plan in place, you are much more likely to succeed.

Business and Management, The Soft Side of Software

A Hitchhiker's Guide to the Blockchain Universe

  Jim Waldo

Blockchain remains a mystery, despite its growing acceptance.

It is difficult these days to avoid hearing about blockchain. Despite the significant potential of blockchain, it is also difficult to find a consistent description of what it really is. This article looks at the basics of blockchain: the individual components, how those components fit together, and what changes might be made to solve some of the problems with blockchain technology.

Networks, Security


September/October 2018

Tear Down the Method Prisons! Set Free the Practices!

  Ivar Jacobson, Roly Stimson

Essence: a new way of thinking that promises to liberate the practices and enable true learning organizations

This article explains why we need to break out of this repetitive dysfunctional behavior, and it introduces Essence, a new way of thinking that promises to free the practices from their method prisons and thus enable true learning organizations.


Research for Practice:
Security for the Modern Age

  Jessie Frazelle

Securely running processes that require the entire syscall interface

While evidence has shown that "a container with a well-crafted seccomp profile provides roughly equivalent security to a hypervisor", methods are still needed for securely running those processes that require the entire syscall interface. Solving this problem has led to some interesting research.

The container ecosystem is very fast paced. Numerous companies are building products on top of existing technologies, while enterprises are using these technologies and products to run their infrastructures. The focus of the three papers described here is on advancements to the underlying technologies themselves and strategic ways to secure software in the modern age.

Giving operators a usable means of securing the methods they use to deploy and run applications is a win for everyone. Keeping the usability-focused abstractions provided by containers, while finding new ways to automate security and defend against attacks, is a great path forward.

Development, Performance, Research for Practice, Security

Everything Sysadmin
SQL is No Excuse to Avoid DevOps

  Thomas A. Limoncelli

Automation and a little discipline allow better testing, shorter release cycles, and reduced business risk.

Using SQL databases is not an impediment to doing DevOps. Automating schema management and a little developer discipline enables more vigorous and repeatable testing, shorter release cycles, and reduced business risk.

Automating releases liberates us. It turns a worrisome, stressful, manual upgrade process into a regular event that happens without incident. It reduces business risk but, more importantly, creates a more sustainable workplace.

When you can confidently deploy new releases, you do it more frequently. New features that previously sat unreleased for weeks or months now reach users sooner. Bugs are fixed faster. Security holes are closed sooner. It enables the company to provide better value to customers.

Data and Databases, Development, Everything Sysadmin, Systems Administration

Understanding Database Reconstruction Attacks on Public Data

  Simson Garfinkel, John M. Abowd, and Christian Martindale, U.S. Census Bureau

These attacks on statistical databases are no longer a theoretical danger.

With the dramatic improvement in both computer speeds and the efficiency of SAT and other NP-hard solvers in the last decade, DRAs on statistical databases are no longer just a theoretical danger. The vast quantity of data products published by statistical agencies each year may give a determined attacker more than enough information to reconstruct some or all of a target database and breach the privacy of millions of people. Traditional disclosure-avoidance techniques are not designed to protect against this kind of attack.

Faced with the threat of database reconstruction, statistical agencies have two choices: they can either publish dramatically less information or use some kind of noise injection. Agencies can use differential privacy to determine the minimum amount of noise necessary to add, and the most efficient way to add that noise, in order to achieve their privacy protection goals.

Data and Databases, Security

Kode Vicious
Writing a Test Plan

Establish your hypotheses, methodologies, and expected results.

If you can think of each of your tests as an experiment with a hypothesis, a test methodology, and a test result, it should all fall into place rather than falling through the cracks.

Development, Kode Vicious

The Soft Side of Software
The Importance of a Great Finish

  Kate Matsudaira

You have to finish strong, every time.

How can you make sure that you are recognized as a valuable member of your team, whose work is seen as critical to the team's success? Here is how to keep your momentum up and make the right moves to be a visible contributor to the final success of every project.

Business and Management, The Soft Side of Software

Case Study
CodeFlow: Improving the Code Review Process at Microsoft

A discussion with Jacek Czerwonka, Michaela Greiler, Christian Bird, Lucas Panjer, and Terry Coatta

People may associate code reviews with debugging, but that's not as central to the code-review process as you might think. The real win comes in the form of improved long-term code maintainability.

Case Studies, Workflow

Benchmarking "Hello, World!"

  Richard L. Sites

Six different views of the execution of "Hello, World!" show what is often missing in today's tools

Too often a service provider has a performance promise to keep but few tools for measuring the existence of laggard transactions, and none at all for understanding their root causes. As more and more software moves off the desktop and into data centers, and more and more cell phones use server requests as the other half of apps, observation tools for large-scale distributed transaction systems are not keeping up. Know what each tool you use is blind to, know what information you need to understand a performance problem, and then look for tools that can actually observe that information directly.

Development, Performance


July/August 2018

Using Remote Cache Service for Bazel

  Alpha Lam

Save time by sharing and reusing build and test output

Bazel is an actively developed open-source build and test system that aims to increase productivity in software development. It has a growing number of optimizations to improve the performance of daily development tasks. Remote cache service is a new development that significantly saves time in running builds and tests. It is particularly useful for a large code base and any size of development team.


Kode Vicious
A Chance Gardener

Harvesting open-source products and planting the next crop

It is a very natural progression for a company to go from being a pure consumer of open source, to interacting with the project via patch submission, and then becoming a direct contributor. No one would expect a company to be a direct contributor to all the open-source projects it consumes, as most companies consume far more software than they would ever produce, which is the bounty of the open-source garden. It ought to be the goal of every company consuming open source to contribute something back, however, so that its garden continues to bear fruit, instead of rotting vegetables.

Kode Vicious, Open Source

Why SRE Documents Matter

  Shylaja Nukala, Vivek Rau

How documentation enables SRE teams to manage new and existing services

SRE (site reliability engineering) is a job function, a mindset, and a set of engineering approaches for making web products and services run reliably. SREs operate at the intersection of software development and systems engineering to solve operational problems and engineer solutions to design, build, and run large-scale distributed systems scalably, reliably, and efficiently. A mature SRE team likely has well-defined bodies of documentation associated with many SRE functions. If you manage an SRE team or intend to start one, this article will help you understand the types of documents your team needs to write and why each type is needed, allowing you to plan for and prioritize documentation work along with other team projects.

Web Development

How to Live in a Post-Meltdown and -Spectre World

  Rich Bennett, Craig Callahan, Stacy Jones, Matt Levine, Merrill Miller, and Andy Ozment

Learn from the past to prepare for the next battle.

The scope of vulnerabilities such as Meltdown and Spectre is so vast that it can be difficult to address. At best, this is an incredibly complex situation for an organization like Goldman Sachs with dedicated threat, vulnerability management, and infrastructure teams. Navigation for a small or medium-sized business without dedicated triage teams is likely harder. We rely heavily on vendor coordination for clarity on patch dependency and still have to move forward with less-than-perfect answers at times.

Good cyber-hygiene practices remain foundational—the nature of the vulnerability is different, but the framework and approach to managing it are not. In a world of zero days and multidimensional vulnerabilities such as Spectre and Meltdown, the speed and effectiveness of the response to triage and prioritizing risk-reduction efforts are vital to all organizations. More high-profile and complex vulnerabilities are sure to follow, so now is a good time to take lessons learned from Spectre and Meltdown and use them to help prepare for the next battle.


The Soft Side of Software
How to Get Things Done When You Don't Feel Like It

  Kate Matsudaira

Five strategies for pushing through

If you want to be successful, then it serves you better to rise to the occasion no matter what. That means learning how to push through challenges and deliver valuable results.

Business and Management, The Soft Side of Software

Tracking and Controlling Microservice Dependencies

  Silvia Esparrachiari, Tanya Reilly, and Ashleigh Rentz

Dependency management is a crucial part of system and software design.

Dependency cycles will be familiar to you if you have ever locked your keys inside your house or car. You can't open the lock without the key, but you can't get the key without opening the lock. Some cycles are obvious, but more complex dependency cycles can be challenging to find before they lead to outages. Strategies for tracking and controlling dependencies are necessary for maintaining reliable systems.

Dependencies can be tracked by observing the behavior of a system, but preventing dependency problems before they reach production requires a more active strategy. Implementing dependency control ensures that each new dependency can be added to a DAG (directed acyclic graph) before it enters use. This gives system designers the freedom to add new dependencies where they are valuable, while eliminating much of the risk that comes from the uncontrolled growth of dependencies.

Development, Web Services


Older Issues