Volume 23, Issue 2

AI: It's All About Inference Now

Michael Gschwind

Model inference has become the critical driver for model performance.

As the scaling of pretraining is reaching a plateau of diminishing returns, model inference is quickly becoming an important driver for model performance. Today, test-time compute scaling offers a new, exciting avenue to increase model performance beyond what can be achieved with training, and test-time compute techniques cover a fertile area for many more breakthroughs in AI. Innovations using ensemble methods, iterative refinement, repeated sampling, retrieval augmentation, chain-of-thought reasoning, search, and agentic ensembles are already yielding improvements in model quality performance and offer additional opportunities for future growth.

AI

Develop, Deploy, Operate

Titus Winters, Leah Rivers, and Salim Virji

A holistic model for understanding the costs and value of software development

By taking a holistic view of the commercial software-development process, we have identified tensions between various factors and where changes in one phase, or to infrastructure, affect other phases. We have distinguished four distinct forms of impact, warned against measuring against unknown counterfactuals, and suggested a consensus mechanism for estimating DDR (defect detection and resolution) costs. Our approach balances product outcomes and the strategic need for change with both the human and machine costs of producing valuable software. With this model, the process of commercial software development could become more comprehensible across roles and levels and therefore more easily improved within an organization.

Business/Management, Development

Generative AI at the Edge: Challenges and Opportunities

Vijay Janapa Reddi

The next phase in AI deployment

Generative AI at the edge is the next phase in AI's deployment. By tackling the technical hurdles and establishing new frameworks, we can ensure this transition is successful and beneficial. The coming years will likely see embodied, federated, and cooperative small models become commonplace, quietly working to enhance our lives in the background, much as embedded microcontrollers did in the previous tech generation. The difference is, these models won't just compute; they will communicate, create, and adapt.

AI

Research for Practice
The Point is Addressing

Daniel Bittman with introduction by Peter Alvaro

A brief tour of efforts to reimagine programming in a world of changing memories

Even something as innocent as addressing comes from a rich design space filled with tradeoffs between important considerations such as scaling, transparency, overhead, and programmer control. These tradeoffs are just some of the examples of the many challenges facing programmers today, especially as we drive our applications to larger scales. The way we refer to and address data matters, with reasons ranging from speed to complexity to consistency, and can have unexpected effects down the line if we do not carefully consider how we talk about and refer to data at large.

Memory, Research for Practice

Drill Bits
Sandboxing: Foolproof Boundaries vs. Unbounded Foolishness

Terence Kelly with Special Guest Borer Edison Fuh

Sandboxing mitigates the risks of software so large and complex that it's likely to harbor security vulnerabilities. To safely harness useful yet ominously opaque libraries, a simple mechanism provides ironclad confinement—or does it?

Code, Development, Drill Bits, Security

Kode Vicious
Can't We Have Nice Things?

Careful crafting and the longevity of code

We build apparatus in order to show some effect we're trying to discover or measure. A good example is Faraday's motor experiment, which showed the interaction between electricity and magnetism. The apparatus has several components, but the main feature is that it makes visible an invisible force: electromagnetism. Faraday clearly had a hypothesis about the interaction between electricity and magnetism, and all science starts from a hypothesis. The next step was to show, through experiment, an effect that proved or disproved the hypothesis. This is how empiricists operate. They have a hunch, build an apparatus, run an experiment, refine the hunch, and then wash, rinse, and repeat.

Code, Development, Kode Vicious

The Soft Side of Software
Peer Mentoring

Kate Matsudaira

My favorite growth hack for engineers and leaders

Stop waiting for a senior mentor to appear. Your peers are some of the most valuable mentors you'll ever find. Start leveraging those relationships, sharing insights, and bringing value to every conversation. Your career will thank you for it.

Business/Management, The Soft Side of Software

Volume 23, Issue 1

From Function Frustrations to Framework Flexibility

Erik Meijer

Fixing tool calls with indirection

The principle of indirection can be applied to introduce a paradigm shift: replacing direct value manipulation with symbolic reasoning using named variables. This simple yet powerful trick directly resolves inconsistencies in tool usage and enables parameterization and abstraction of interactions. The transformation of function calls into reusable and interpretable frameworks elevates tool calling into a neuro-symbolic reasoning framework. This approach unlocks new possibilities for structured interaction and dynamic AI systems.

AI

Operations and Life
A Clean Approach to Process Optimization

Thomas A. Limoncelli

What I learned from my dishwasher about automating processes

My soap-loading technique isn't revolutionary, but it does demonstrate a point about process design: You can eliminate delays in starting a process by front-loading tasks whenever possible. Front-loading changes when you do tasks but not their order. The process still involves a loop: load dishes, add soap, press start button, empty dishes repeat. You've only changed your mental model of where the loop starts.

Development, Management, Operations and Life

The Surprise of Multiple Dependency Graphs

Josie Anugerah, Eve Martin-Jones

Dependency resolution is not deterministic.

It seems like it should be easy to avoid installing vulnerable open source software, but dependency graphs are surprisingly complex. At the time of writing, the latest version of the popular npm tool webpack has millions of potential dependency graphs depending on circumstances during its resolution. The exact graph chosen for a given package can depend on what other software is being built, what kind of system is building it, and even the state of the ecosystem on a given day. As a result, the developer and user of a package may end up with very different dependency graphs, which can lead to unexpected vulnerabilities.

Open source

Fifty Years of Open Source Software Supply Chain Security

Russ Cox

For decades, software reuse was only a lofty goal. Now it's very real.

The xz attack seems to be the first major attack on the open source software supply chain. The event-stream attack was similar but not major, and Heartbleed and Log4j were vulnerabilities, not attacks. But the xz attack was discovered essentially by accident because it made sshd just a bit too slow at startup. Attacks, by their nature, try to remain hidden. What are the chances we would accidentally discover the very first major attack on the open source software supply chain in just a few weeks? Perhaps we were extremely lucky, or perhaps we have missed others.

Open source

String Matching at Scale

Dennis Roellke

A call for interdisciplinary collaboration and better-directed resources

String matching can't be that difficult. But what are we matching on? What is the intrinsic identity of a software component? Does it change when developers copy and paste the source code instead of fetching it from a package manager? Is every package-manager request fetching the same artifact from the same upstream repository mirror? Can we trust that the source code published along with the artifact is indeed what's built into the release executable? Is the tool chain kosher?

Development

How to Evaluate AI that's Smarter than Us

Chip Huyen

Exploring three strategies: functional correctness, AI-as-a-judge, and comparative evaluation

Evaluating AI models that surpass human expertise in the task at hand presents unique challenges. These challenges only grow as AI becomes more intelligent. However, the three effective strategies presented in this article exist to address these hurdles. The strategies are: Functional correctness: evaluating AI by how well it accomplishes its intended tasks; AI-as-a-judge: using AI instead of human experts to evaluate AI outputs; and Comparative evaluation: evaluating AI systems in relationship with each other instead of independently.

AI

Kode Vicious
Analyzing Krazy Kode

Accounting for the emotional state of the person who wrote that code

There actually are about six or seven emotions, or so I'm told. But the one state you should really try to avoid is confusion, which isn't actually an emotion but instead a state of mind. Code created by a confused mind shows itself in the randomness of naming, which is not handled by modern, fascist, programming languages like Go. Sure, you may have your names in the proper case and your spaces in the proper place, but you can still name a function PublicThingTwo() if you want to, and this is a sure sign of trouble.

Business/Management, Kode Vicious

Volume 22, Issue 6

The Soft Side of Software
My Career-limiting Communication

Kate Matsudaira

Be thoughtful about your content. You've got a lot riding on it.

Whether in email, documents, or slides, use punchy visuals to make content easier to digest with your most important points clearly highlighted. Make sure that data, charts, and photos are unambiguously labeled, with any caveats noted. In general, steer away from pie charts, averages, and percentages. That's because, as popular as these devices might be, they often manage to tell only part of the story and miss opportunities to highlight the relative size of datasets, outliers, or trends over time.

Business/Management, The Soft Side of Software

Systems Correctness Practices at AWS

Marc Brooker, Ankush Desai

Leveraging Formal and Semi-formal Methods

Building reliable and secure software requires a range of approaches to reason about systems correctness. Alongside industry-standard testing methods (such as unit and integration testing), AWS has adopted model checking, fuzzing, property-based testing, fault-injection testing, deterministic simulation, event-based simulation, and runtime validation of execution traces. Formal methods have been an important part of the development process—perhaps most importantly, formal specifications as test oracles that provide the correct answers for many of AWS's testing practices.

Concurrency

Intermediate Representations for the Datacenter Computer

Achilles Benetopoulos

Lowering the Burden of Robust and Performant Distributed Systems

In-memory application data size is outstripping the capacity of individual machines, necessitating its partitioning over clusters of them; online services have high availability requirements, which can be met only by deploying systems as collections of multiple redundant components; high durability requirements can be satisfied only through data replication, sometimes across vast geographical distances.

Data, Distributed Computing

Simulation: An Underutilized Tool in Distributed Systems

David R. Morrison

Not easy but not impossible, and worth it for the insights it can provide

Simulation has a huge role to play in the advent of AI systems: We need an efficient, fast, and cost-effective way to train AI agents to operate in our infrastructure, and simulation absolutely provides that capability.

AI, Distributed Computing

Operations and Life
Give Engineers Problems, Not Solutions

Thomas A. Limoncelli

A simple strategy to improve solutions and boost morale

This technique is about providing the "why" instead of the "how." Instead of dictating specific solutions, present the problem and desired outcome, and let your team figure out how to solve it. This fosters creativity, shared ownership, and collaborative problem-solving. It also empowers the team to strive for the best solution.

Management, Operations and Life

Kode Vicious
The Drunken Plagiarists

Working with Co-pilots

The trick of an LLM is to use a little randomness and a lot of text to guess the next word in a sentence. Seems kind of trivial, really, and certainly not a measure of intelligence that anyone who understands the term might use. But it's a clever trick and does have some applications.

AI, Kode Vicious

Drill Bits
Retrofitting: Principles and Practice

Terence Kelly with Special Guest Borer Ziheng (Aaron) Su

Retrofitting radically new functionality onto production software tests every skill of the programmers craft. A practical case study illuminates principles for bolting new tricks onto old dogs.

Code, Development, Drill Bits

The Price of Intelligence

Mark Russinovich, Ahmed Salem, Santiago Zanella-Béguelin, Yonatan Zunger

Three risks inherent in LLMs

The vulnerability of LLMs to hallucination, prompt injection, and jailbreaks poses a significant but surmountable challenge to their widespread adoption and responsible use. We have argued that these problems are inherent, certainly in the present generation of models and likely in LLMs per se, and so our approach can never be based on eliminating them; rather, we should apply strategies of "defense in depth" to mitigate them, and when building and using these systems, do so on the assumption that they will sometimes fail in these directions.

AI

Volume 22, Issue 5

Special Issue on Accessibility

It's Time to Make Software Accessible

Stacy M. Branham, Shahtab Wahid

Here's how, from OS to organization

The articles that constitute this special issue on accessibility show that, regardless of whether you work on the back end, front end, design, or are part of an organization's leadership, there are steps you can take to make progress. Before we get to writing software and making policy, however, there is an even more fundamental concern: the widespread misconception of the nature of disability and assistive and accessible technology. If we are going to change the status quo, we must start there.

HCI

The State of Digital Accessibility

Stacy M. Branham, Shahtab Wahid, Sheri Byrne-Haber,

Jamal Mazrui, Carlos Muncharaz, Carl Myhill

If you are new to digital accessibility, and even if you are not, it can be difficult to stay abreast of the big picture, and the tech industry moves fast. So, we asked a team of experts to bring us up to speed. Not only do they have day jobs that involve digital accessibility, but they also have lived experience of disability. We posed the following questions to them: What's the state of accessibility? Key challenges? Why do we need accessible software? How can we make the case for accessibility? Who's leading the way? Where do we go from here?

HCI

System-class Accessibility

Chris Fleizach, Jeffrey P. Bigham

The architectural support for making a whole system usable by people with disabilities

This article illustrates system-class accessibility with our work enabling iPhones to be used nonvisually using the VoiceOver screen reader. We reimagined touchscreen input for nonvisual use, introducing new gestures suitable for control of a screen reader, and for output we added support for synthesized speech and refreshable braille displays (hardware devices that output tactile braille characters). We added new accessibility APIs that applications could adopt and made our user interface frameworks include them by default. Finally, we added an accessibility service to bridge between these new inputs and outputs and the applications. Because we implemented support for VoiceOver at the system level, future accessibility features that we have released since have directly leveraged this work to provide a consistent user experience.

HCI

Accessibility Considerations for Mobile Applications

Juanami Spencer

How the Bloomberg Connects app supports accessibility in the product and process

Considering accessibility is essential when creating mobile applications to ensure they are usable and enjoyable for as broad an audience as possible. Mobile accessibility has unique considerations compared with desktop experiences, but it provides immense value to those users who rely on mobile devices in their day-to-day activities. By keeping these considerations in mind, mobile product development teams can better support and enhance the lives of all users. This article explores some of the key accessibility considerations for a mobile application and highlights a few ways the Bloomberg Connects app supports accessibility in both the product and process.

HCI

Design Systems Are Accessibility Delivery Vehicles

Shahtab Wahid

Making accessibility support for applications scalable, productive, and consistent

Design systems are infrastructure built for consumers—the designers and developers—working on applications. A successful one allows consumers in an organization to quickly scale design and development across applications, increase productivity, and establish consistency. Many consumers, however, are not prepared to build for accessibility. Couldn't an organization make building accessibility support for applications scalable, productive, and consistent? This article explores how a design system becomes an important vehicle to supporting accessibility.

HCI

Driving Organizational Accessibility

Vinnie Donati

People often ask about the secret sauce behind Microsoft's approach to accessibility and inclusion. It's simple: We run it like a business.

In this article we'll explore how Microsoft drives accessibility throughout its organization and we'll look closely at essential frameworks and practices that promote an inclusive culture. Through examining aspects like awareness building, strategic development, accessibility maturity modeling, and more, we aim to offer a guide for organizations starting their accessibility journey. The idea is to share what we've learned in the hope that you can take it, tweak it to fit your company's purpose, and nurture accessibility in a way that's not just a checkbox activity but genuinely integrated into your culture.

HCI

You Don't Know Jack About AI

Sonja Johnson-Yu, Sanket Shah

And ChatGPT probably doesn't either

For a long time, it was hard to pin down what exactly AI was. A few years back, such discussions would devolve into hours-long sessions of sketching out Venn diagrams and trying to map out the different subfields of AI. Fast-forward to 2024, and we all now know exactly what AI is. AI = ChatGPT. Or not.

AI

The Bikeshed
Civics is Boring. So, Let's Encrypt Something!

Poul-Henning Kamp

IT professionals can either passively suffer political solutions or participate in the process to achieve something better.

The proposal offered here ought to make everybody happy. Law enforcement will have ways to gain access to communications, provided they can convince a judge it's necessary. Important communications will be able to continue using the same strength of encryption they use today. Communications that didn't require encryption in the first place will be able to employ sufficient encryption to prevent trivial wiretapping, but nothing strong enough to prevent brute-force access should a judge decide that's necessary.

The Bikeshed, Security

Kode Vicious
Building on Shaky Ground

We owe it to the world to make systems work safely and reliably.

The CrowdStrike catastrophe happened because of architectural issues in hardware and in systems software. We should be building systems that make writing a virus difficult, not child's play. But that's an expensive proposition now.

Kode Vicious, Security

Volume 22, Issue 4

Case Study
Program Merge: What's Deep Learning Got to Do with It?

A discussion with Shuvendu Lahiri, Alexey Svyatkovskiy, Christian Bird, Erik Meijer and Terry Coatta

If you regularly work with open-source code or produce software for a large organization, you're already familiar with many of the challenges posed by collaborative programming at scale. Some of the most vexing of these tend to surface as a consequence of the many independent alterations inevitably made to code, which, unsurprisingly, can lead to updates that don't synchronize. Difficult merges are nothing new, of course, but the scale of the problem has gotten much worse. This is what led a group of researchers at MSR (Microsoft Research) to take on the task of complicated merges as a grand program-repair challenge, one they believed might be addressed at least in part by machine learning.

AI, Case Studies

Research for Practice
Deterministic Record-and-Replay

Peter Alvaro, Andrew Quinn

Zeroing in only on the nondeterministic actions of the process

This column describes three recent research advances related to deterministic record-and-replay, with the goal of showing both classical use cases and emerging use cases. A growing number of systems use a weaker form of deterministic record-and-replay. Essentially, these systems exploit the determinism that exists across many program executions but intentionally allow some nondeterminism for performance reasons. This trend is exemplified in GPUReplay in particular, but also in systems such as ShortCut and Dora.

Debugging, Research for Practice

Bridging the Moat
Test Accounts: A Hidden Risk

You may decide the risks are acceptable. But, if not, here are some rules for avoiding them.

A test account that's shared among many can be used by anyone who happens to have the password. This leaves a trail of poorly managed or unmanaged accounts that only increases your attack surface. A test account could be a treasure trove of information, even revealing information about internal system details. If you really need to take this approach, give your developers their own test accounts and then educate them about the risks of misusing these accounts. Also, if you can periodically expire these accounts, all the better.

Bridging the Moat, Security

Kode Vicious
Unwanted Surprises

When that joke of an API is on you

There is the higher-order question of whether loosely typed languages with coercion are really a good idea in the first place. If you don't know what you're operating on, or what the expected output range might be, then maybe you ought not to be operating on that data in the first place. But now these languages have gotten into the wild and we'll never be able to hunt them down and kill them soon enough for my liking, or for the greater good.

Development, Kode Vicious

GPTs and Hallucination

Jim Waldo, Soline Boussard

Why do large language models hallucinate?

The findings in this experiment support the hypothesis that GPTs based on LLMs perform well on prompts that are more popular and have reached a general consensus yet struggle on controversial topics or topics with limited data. The variability in the applications's responses underscores that the models depend on the quantity and quality of their training data, paralleling the system of crowdsourcing that relies on diverse and credible contributions. Thus, while GPTs can serve as useful tools for many mundane tasks, their engagement with obscure and polarized topics should be interpreted with caution. LLMs' reliance on probabilistic models to produce statements about the world ties their accuracy closely to the breadth and quality of the data they're given.

AI, Privacy and Rights

Confidential Computing Proofs

Mark Russinovich, Cédric Fournet, Greg Zaverucha, Josh Benaloh, Brandon Murdoch, Manuel Costa

An alternative to cryptographic zero-knowledge

Proofs are powerful tools for integrity and privacy, enabling the verifier to delegate a computation and still verify its correct execution, and enabling the prover to keep the details of the computation private. Both CCP and ZKP can achieve soundness and zero-knowledge but with important differences. CCP relies on hardware trust assumptions, which yield high performance and additional confidentiality protection for the prover but may be unacceptable for some applications. CCP is also often easier to use, notably with existing code, whereas ZKP comes with a large prover overhead that may be unpractical for some applications.

Privacy and Rights, Security

Assessing IT Project Success: Perception vs. Reality

João Varajão, António Trigo

We would not be in the digital age if it were not for the recurrent success of IT projects.

This study has significant implications for practice, research, and education by providing new insights into IT project success. It expands the body of knowledge on project management by reporting project success (and not exclusively project management success), grounded in several objective criteria such as deliverables usage by the client in the post-project stage, hiring of project-related support/maintenance services by the client, contracting of new projects by the client, and vendor recommendation by the client to potential clients. Researchers can find a set of criteria they can use when studying and reporting the success of IT projects, thus expanding the current perspective on evaluation and contributing to more accurate conclusions. For practitioners, this study provides a rich set of criteria that can be used for evaluating their projects, as well as strong evidence of the importance of considering not only project execution, but also post-project outcomes and impacts in the evaluation.

Business and Management, Education

Questioning the Criteria for Evaluating Non-cryptographic Hash Functions

Catherine Hayes, David Malone

Maybe we need to think more about non-cryptographic hash functions.

Although cryptographic and non-cryptographic hash functions are everywhere, there seems to be a gap in how they are designed. Lots of criteria exist for cryptographic hashes motivated by various security requirements, but on the non-cryptographic side there is a certain amount of folklore that, despite the long history of hash functions, has not been fully explored. While targeting a uniform distribution makes a lot of sense for real-world datasets, it can be a challenge when confronted by a dataset with particular patterns.

Development

Sign up for QueueNews

acmqueue app