Volume 19, Issue 6

Drill Bits
Steampunk Machine Learning

Victorian contrivances for modern data science

  Terence Kelly

Fitting models to data is all the rage nowadays but has long been an essential skill of engineers. Veterans know that real-world systems foil textbook techniques by interleaving routine operating conditions with bouts of overload and failure; to be practical, a method must model the former without distortion by the latter. Surprisingly effective aid comes from an unlikely quarter: a simple and intuitive model-fitting approach that predates the Babbage Engine. The foundation of industrial-strength decision support and anomaly detection for production datacenters, this approach yields accurate yet intelligible models without hand-holding or fuss. It is easy to practice with modern analytics software and is widely applicable to computing systems and beyond.

Drill Bits, Code, Data, Development, Visualization, AI and Machine Learning, Performance

Interpretable Machine Learning

  Valerie Chen, Jeffrey Li, Joon Sik Kim, Gregory Plumb, Ameet Talwalkar

Moving from mythos to diagnostics

The emergence of machine learning as a society-changing technology in the past decade has triggered concerns about people's inability to understand the reasoning of increasingly complex models. The field of IML (interpretable machine learning) grew out of these concerns, with the goal of empowering various stakeholders to tackle use cases, such as building trust in models, performing model debugging, and generally informing real human decision-making.



Volume 19, Issue 5

Kode Vicious:
I Unplugged What?

The lessons here are broader than just a simple "Don't do that."

Dear KV, I'm sure by now you've read about the latest large systems failure, and I wondered if you'd share your thoughts on how such a large company can fail so miserably at infrastructure. I'm probably lobbing a softball, but how is it possible that these large and pervasive failures happen?

Business, Failure and recovery, Kode Vicious, Sysadmin

A Conversation with Margo Seltzer and Mike Olson

The history of Berkeley DB

Kirk McKusick sat down with Margo Seltzer and Mike Olson to discuss the history of Berkeley DB, for which they won the ACM Software System Award in 2021. Kirk McKusick has spent his career as a BSD and FreeBSD developer. Margo Seltzer has spent her career as a professor of computer science and as an entrepreneur of database software companies. Mike Olson started his career as a software developer and later started and managed several open-source software companies. Berkeley DB is a production-quality, scalable, NoSQL, Open Source platform for embedded transactional data management.

Databases, Interviews

Case Study
It Takes a Community:
The Open-source Challenge

A discussion with Reynold Xin, Wes McKinney, Alan Gates, and Chris McCubbin

Of the many challenges faced by open-source developers, among the most daunting are some that other programmers scarcely ever think about. Building a successful open-source community depends on many different elements, some of which are familiar to any developer. Just as important are the skills to recruit, to inspire, to mentor, to manage, and to mediate disputes. But what exactly does it take to pull all that off?

Case studies, Open Source

Federated Learning and Privacy

  Kallista Bonawitz, Peter Kairouz, Brendan McMahan, and Daniel Ramage

Building privacy-preserving systems for machine learning and data science on decentralized data

Centralized data collection can expose individuals to privacy risks and organizations to legal risks if data is not properly managed. Federated learning is a machine learning setting where multiple entities collaborate in solving a machine learning problem, under the coordination of a central server or service provider. Each client's raw data is stored locally and not exchanged or transferred; instead, focused updates intended for immediate aggregation are used to achieve the learning objective. This article provides a brief introduction to key concepts in federated learning and analytics with an emphasis on how privacy technologies may be combined in real-world systems and how their use charts a path toward societal benefit from aggregate statistics in new domains and with minimized risk to individuals and to the organizations who are custodians of the data.

AI, Privacy and Rights

Commit to Memory:
Chip Measuring Contest

  Jessie Frazelle

The benefits of purpose-built chips

Alan Kay once said, "People who are really serious about software should make their own hardware." We are now seeing product companies genuinely live up to this value. It is exciting when the incumbents known as the chip vendors are being outdone, in the very technology that is their bread and butter, by their previous customers. Let's dive into some of the interesting bits of these purpose-built chips: the benefits of economics, user experience, and performance for the companies building them.

Commit to Memory, Hardware,

Meaning and Context in Computer Programs

  Alvaro Videla

Sharing domain knowledge among programmers using the source code as the medium

When you look at a function program's source code, how do you know what it means? Is the meaning found in the return values of the function, or is it located inside the function body? What about the function name? Answering these questions is important to understanding how to share domain knowledge among programmers using the source code as the medium. The program is the medium of communication among programmers to share their solutions.

Code, Development

Lamboozling Attackers: A New Generation of Deception

  Kelly Shortridge and Ryan Petrich

Software engineering teams can exploit attackers' human nature by building deception environments.

The goal of this article is to educate software leaders, engineers, and architects on the potential of deception for systems resilience and the practical considerations for building deception environments. By examining the inadequacy and stagnancy of historical deception efforts by the information security community, the article also demonstrates why engineering teams are now poised to become significantly more successful owners of deception systems.

Privacy and Rights, Security


Volume 19, Issue 4

Kode Vicious:
Patent Absurdity

A case when ignorance is the best policy

The main reason a lawyer will give for not reading a software patent is that, if you run afoul of the patent and it can be shown that you had knowledge of it, your company will incur triple the damages that they would have, had you not had knowledge of the patent. That seems like reason enough to avoid reading them, but there is an even better reason, and that is, as design or technical documents, software patents suck.

Business, Kode Vicious

The Bikeshed:
The Software Industry IS STILL the Problem

  Poul-Henning Kamp

The time is (also) way overdue for IT professional liability

The time is way overdue for IT engineers to be subject to professional liability, like almost every other engineering profession. Before you tell me that is impossible, please study how the very same thing happened with electricity, planes, cranes, trains, ships, automobiles, lifts, food processing, buildings, and, for that matter, driving a car.

Business, Compliance, The Bikeshed

Drill Bits
Crashproofing the Original NoSQL Key-Value Store

  Terence Kelly

Fortifying software to protect persistent data from crashes can be remarkably easy if a modern file system handles the heavy lifting. This episode of Drill Bits unveils a new crash-tolerance mechanism that vaults the venerable gdbm database into the league of transactional NoSQL data stores. We'll motivate this upgrade by tracing gdbm's history. We'll survey the subtle science of crashproofing, navigating a minefield of traps for the unwary. We'll arrive at a compact and rugged design that leverages modern file-system features, and we'll tour the production-ready implementation of this design and its ergonomic interface. This new approach is quite generic: It can enable a wide range of software to tolerate crashes.

Code, Databases, Development, Drill Bits, Open Source, Software Design

Special issue on Static Analysis

Static Analysis: An Introduction

  Patrick Thomson

The fundamental challenge of software engineering is one of complexity.

Modern static-analysis tools provide powerful and specific insights into codebases. The Linux kernel team, for example, developed Coccinelle, a powerful tool for searching, analyzing, and rewriting C source code; because the Linux kernel contains more than 27 million lines of code, a static-analysis tool is essential both for finding bugs and for making automated changes across its many libraries and modules. Another tool targeted at the C family of languages is Clang scan-build, which comes with many useful analyses and provides an API for programmers to write their own analyses. Like so many things in computer science, the utility of static analysis is self-referential: To write reliable programs, we must also write programs for our programs. But this is no paradox. Static-analysis tools, complex though their theory and practice may be, are what will enable us, and engineers of the future, to overcome this challenge and yield the knowledge and insights that we practitioners deserve.

Code, Development, Tools

Static Analysis at GitHub

  Timothy Clem and Patrick Thomson

An experience report

The Semantic Code team at GitHub builds and operates a suite of technologies that power symbolic code navigation on github.com. We learned that scale is about adoption, user behavior, incremental improvement, and utility. Static analysis in particular is difficult to scale with respect to human behavior; we often think of complex analysis tools working to find potentially problematic patterns in code and then trying to convince the humans to fix them. Our approach took a different tack: use basic analysis techniques to quickly put information that augments our ability to understand programs in front of everyone reading code on GitHub with zero configuration required and almost immediate availability after code changes.

Code, Development, Tools

Human-Centered Approach to Static-Analysis-Driven Developer Tools

  Ayman Nadeem

The future depends on good HCI.

Complex and opaque systems do not scale easily. A human-centered approach for evolving tools and practices is essential to ensuring that software is scaled safely and securely. Static analysis can unveil information about program behavior, but the goal of deriving this information should not be to accumulate hairsplitting detail. HCI can help direct static-analysis techniques into developer-facing systems that structure information and embody relationships in representations that closely mirror a programmer's thought. The survival of great software depends on programming languages that support, rather than inhibit, communicating, reasoning, and abstract thinking.

Code, Development, Tools

Designing UIs for Static Analysis Tools

  Daniil Tiganov, Lisa Nguyen Quang Do, Karim Ali

Evaluating tool design guidelines with SWAN

Static-analysis tools suffer from usability issues such as a high rate of false positives, lack of responsiveness, and unclear warning descriptions and classifications. Here, we explore the effect of applying user-centered approach and design guidelines to SWAN, a security-focused static-analysis tool for the Swift programming language. SWAN is an interesting case study for exploring static-analysis tool usability because of its large target audience, its potential to integrate easily into developers' workflows, and its independence from existing analysis platforms.

Code, Development, Tools


Volume 19, Issue 3

Escaping the Singularity:
Don't Get Stuck in the "Con" Game

  Pat Helland

Consistency, convergence, and confluence are not the same! Eventual consistency and eventual convergence aren't the same as confluence, either.

"Eventual consistency" is a popular phrase with a fuzzy definition. People are even inconsistent in their use of consistency. But two other terms, "convergence" and "confluence", that have crisper definitions and are more easily understood.

Data, Databases, Escaping the Singularity,

Declarative Machine Learning Systems

  Piero Molino, Christopher Ré

The future of machine learning will depend on it being in the hands of the rest of us.

The people training and using ML models now are typically experienced developers with years of study working within large organizations, but the next wave of ML systems should allow a substantially larger number of people, potentially without any coding skills, to perform the same tasks. These new ML systems will not require users to fully understand all the details of how models are trained and used for obtaining predictions, but will provide them a more abstract interface that is less demanding and more familiar. Declarative interfaces are well-suited for this goal, by hiding complexity and favoring separation of interest, and ultimately leading to increased productivity.


Real-world String Comparison

  Torsten Ullrich

How to handle Unicode sequences correctly

In many languages a string comparison is a pitfall for beginners. With any Unicode string as input, a comparison often causes problems even for advanced users. The semantic equivalence of different characters in Unicode requires a normalization of the strings before comparing them. This article shows how to handle Unicode sequences correctly. The comparison of two strings for equality often raises questions concerning the difference between comparison by value, comparison of object references, strict equality, and loose equality. The most important aspect is semantic equivalence.

Code, Data

Kode Vicious:
Divide and Conquer

The use and limits of bisection

Bisection is of no use if you have a heisenbug that fails only from time to time. These subtle bugs are the hardest to fix and the ones that cause us to think critically about what we are doing. Timing bugs, bugs in distributed systems, and all the difficult problems we face in building increasingly complex software systems can't yet be addressed by simple bisection. It's often the case that it would take longer to write a usable bisection test for a complex problem than it would to analyze the problem whilst at the tip of the tree.

Debugging, Kode Vicious

When Curation Becomes Creation

  Liu Leqi, Dylan Hadfield-Menell, and Zachary C. Lipton

Algorithms, microcontent, and the vanishing distinction between platforms and creators

Media platforms today benefit from: (1) discretion to organize content, (2) algorithms for curating user-posted content, and (3) absolution from liability. This favorable regulatory environment results from the current legal framework, which distinguishes between intermediaries and content providers. This distinction is ill-adapted to the modern social media landscape, where platforms deploy powerful data-driven algorithms to play an increasingly active role in shaping what people see, and where users supply disconnected bits of raw content as fodder. Today's platforms have license to monetize whatever content they like, moderate if and when it aligns with their corporate objectives, and curate their content however they wish.

HCI, Opinion, Privacy and Rights

Digging into Big Provenance
(with SPADE)

  Ashish Gehani, Raza Ahmad, Hassaan Irshad, Jianqiao Zhu, and Jignesh Patel

A user interface for querying provenance

Several interfaces exist for querying provenance. Many are not flexible in allowing users to select a database type of their choice. Some provide query functionality in a data model that is different from the graph-oriented one that is natural for provenance. Others have intuitive constructs for finding results but have limited support for efficiently chaining responses, as needed for faceted search. This article presents a user interface for querying provenance that addresses these concerns and is agnostic to the underlying database being used.


The Bikeshed:
What Went Wrong?

  Poul-Henning Kamp

Why we need an IT accident investigation board

Governments should create IT accident investigation boards for the exact same reasons they have done so for ships, railroads, planes, and in many cases, automobiles. Denmark got its Railroad Accident Investigation Board because too many people were maimed and killed by steam trains. The UK's Air Accidents Investigation Branch was created for pretty much the same reasons, but, specifically, because when the airlines investigated themselves, nobody was any the wiser. Does that sound slightly familiar in any way?

Compliance, The Bikeshed


Volume 19, Issue 2

Commit to Memory:
A New Era for Mechanical CAD

  Jessie Frazelle

Time to move forward from decades-old design

The hardware industry is desperate for a modern way to do mechanical design. A new CAD program created for the modern world would lower the barrier to building hardware, decrease the time of development, and usher in a new era of building. The tools used to build with today are supported on the shoulders of giants, but a lot could be done to make them even better. At some point, mechanical CAD lost some of its roots of innovation. Let's dive into a few of the problems with the CAD programs that exist today and see how to make them better.

Commit to Memory, Hardware,

Escaping the Singularity:
ACID: My Personal "C" Change

  Pat Helland

How could I miss such a simple thing?

I had a chance recently to chat with my old friend, Andreas Reuter, the inventor of ACID. He and his Ph.D. advisor, Theo Härder, coined the term in their famous 1983 paper, Principles of Transaction-Oriented Database Recovery. I had blinders on after almost four decades of seeing C based on my assumptions. One big lesson for me is to work hard to ALWAYS question your assumptions. Try hard to surround yourself with curious and passionate people, both young and old, who will challenge you and try to dislodge your blinders. Foster a culture that makes them safe as they do so.

Databases, Escaping the Singularity,

Kode Vicious:
In Praise of the Disassembler

There's much to be learned from the lower-level details of hardware.

When you're starting out you want to be able to hold the entire program in your head if at all possible. Once you're conversant with your first, simple assembly language and the machine architecture you're working with, it will be completely possible to look at a page or two of your assembly and know not only what it is supposed to do but also what the machine will do for you step by step. When you look at a high-level language, you should be able to understand what you mean it to do, but often you have no idea just how your intent will be translated into action. Assembly and machine code is where the action is.

Development, Kode Vicious

Drill Bits
Schrödinger's Code: Undefined Behavior in Theory and Practice

  Terence Kelly with special guest borers Weiwei Gu and Vladimir Maksimovski

Undefined behavior ranks among the most baffling and perilous aspects of popular programming languages. This installment of Drill Bits clears up widespread misconceptions and presents practical techniques to banish undefined behavior from your own code and pinpoint meaningless operations in any software—techniques that reveal alarming faults in software supporting business-critical applications at Fortune 500 companies.

Code, Databases, Development, Drill Bits, Open Source, Software Design

Case Study: Quantum-safe Trust for Vehicles:
The Race is Already On

A discussion with Michael Gardiner, Alexander Truskovsky, George Neville-Neil, and Atefeh Mashatan

In the automotive industry, cars now coming off assembly lines are sometimes referred to as "rolling data centers" in acknowledgment of all the entertainment and communications capabilities they contain. The fact that autonomous driving systems are also well along in development does nothing to allay concerns about security. Indeed, it would seem the stakes of automobile cybersecurity are about to become immeasurably higher just as some of the underpinnings of contemporary cybersecurity are rendered moot.

Case studies, Privacy and Rights, Security

The Complex Path to Quantum Resistance

  Dr. Atefeh Mashatan and Douglas Heintzman

Is your organization prepared?

Competing quantum-resistant proposals are currently going through academic due diligence and scrutiny by industry leaders. Until the newly minted quantum-resistant standards are finalized, ICT leaders should do their best to plan for a smooth transition. This article provides a series of recommendations for these decision-makers, including what they need to know and do today. It will help them in devising an effective quantum transition plan with a holistic lens that considers the affected assets in people, process, and technology. To do so, the decision-makers first need to comprehend the nature of quantum computing in order to grasp the impact of the impending quantum threat and appreciate its magnitude.

Privacy and Rights, Security

Biases in AI Systems

  Ramya Srinivasan and Ajay Chander

A survey for practitioners

This article provides an organization of various kinds of biases that can occur in the AI pipeline starting from dataset creation and problem formulation to data analysis and evaluation. It highlights the challenges associated with the design of bias-mitigation strategies, and it outlines some best practices suggested by researchers. Finally, a set of guidelines is presented that could aid ML developers in identifying potential sources of bias, as well as avoiding the introduction of unwanted biases. The work is meant to serve as an educational resource for ML developers in handling and addressing issues related to bias in AI systems.

AI, Privacy and Rights



Older Issues