Volume 21, Issue 2
You Don't know Jack about Application Performance
David Collier-Brown
Knowing whether you're doomed to fail is important when starting a project.
You don't need to do a full-scale benchmark any time you have a performance or capacity planning problem. A simple measurement will provide the bottleneck point of your system: The example program will get significantly slower after eight requests per second per CPU. That's often enough to tell you the most important thing: if you're going to fail.
Performance
Testing
Cargo Cult AI
Edlyn V. Levine
Is the ability to think scientifically the defining essence of intelligence?
Evidence abounds that the human brain does not innately think scientifically; however, it can be taught to do so. The same species that forms cargo cults around widespread and unfounded beliefs in UFOs, ESP, and anything read on social media also produces scientific luminaries such as Sagan and Feynman. Today's cutting-edge LLMs are also not innately scientific. But unlike the human brain, there is good reason to believe they never will be unless new algorithmic paradigms are developed.
AI
Kode Vicious:
The Human Touch
There is no substitute for good, direct, honest training.
The challenge of providing a safe communications environment in the face of such social engineering attacks isn't just the technology; it's also people. As anyone who has done serious work in computer security knows, the biggest problems are between the keyboard and the chair. Most people by default trust other people and are willing to give them the benefit of the doubt.
Business and Management,
Kode Vicious,
Security
Research for Practice:
OS Scheduling
Kostis Kaffes with Introduction by Peter Alvaro
Better scheduling policies for modern computing systems
In any system that multiplexes resources, the problem of scheduling what computations run where and when is perhaps the most fundamental. Yet, like many other essential problems in computing (e.g., query optimization in databases), academic research in scheduling moves like a pendulum, with periods of intense activity followed by periods of dormancy when it is considered a "solved" problem. These three papers make significant contributions to an ongoing effort to develop better scheduling policies for modern computing systems. The papers highlight the need for better, more efficient, and more flexible OS schedulers; open up new areas of research; and demonstrate the importance of continued development and innovation in OS scheduling policies.
Research for Practice
Cargo Cult AI
Edlyn V. Levine
Is the ability to think scientifically the defining essence of intelligence?
Evidence abounds that the human brain does not innately think scientifically; however, it can be taught to do so. The same species that forms cargo cults around widespread and unfounded beliefs in UFOs, ESP, and anything read on social media also produces scientific luminaries such as Sagan and Feynman. Today's cutting-edge LLMs are also not innately scientific. But unlike the human brain, there is good reason to believe they never will be unless new algorithmic paradigms are developed.
AI
Beyond the Repository
Amanda Casari, Julia Ferraioli, and Juniper Lovato
Best practices for open source ecosystems researchers
Much of the existing research about open source elects to study software repositories instead of ecosystems. An open source repository most often refers to the artifacts recorded in a version control system and occasionally includes interactions around the repository itself. An open source ecosystem refers to a collection of repositories, the community, their interactions, incentives, behavioral norms, and culture. The decentralized nature of open source makes holistic analysis of the ecosystem an arduous task, with communities and identities intersecting in organic and evolving ways. Despite these complexities, the increased scrutiny on software security and supply chains makes it of the utmost importance to take an ecosystem-based approach when performing research about open source. This article provides guidelines and best practices for research using data collected from open source ecosystems, encouraging research teams to work with communities in respectful ways.
Open Source
DevEx: What Actually Drives Productivity
Abi Noda, DX
Margaret-Anne Storey, University of Victoria
Nicole Forsgren, Microsoft Research
Michaela Greiler, DX
The developer-centric approach to measuring and improving productivity
Developer experience focuses on the lived experience of developers and the points of friction they encounter in their everyday work. In addition to improving productivity, DevEx drives business performance through increased efficiency, product quality, and employee retention. This paper provides a practical framework for understanding DevEx, and presents a measurement framework that combines feedback from developers with data about the engineering systems they interact with. These two frameworks provide leaders with clear, actionable insights into what to measure and where to focus in order to improve developer productivity.
Business and Management
Operations and Life:
Improvement on End-to-End Encryption
May Lead to Silent Revolution
Thomas A. Limoncelli
Researchers are on the brink of what could be
the next big improvement in communication privacy.
Privacy is an increasing concern, whether you are texting with a business associate or transmitting volumes of data over the Internet. Over the past few decades, cryptographic techniques have enabled privacy improvements in chat apps and other electronic forms of communication. Now researchers are on the brink of what could be the next big improvement in communication privacy: E2EEEE (End-to-End Encryption with Endpoint Elimination). This article is based on interviews with researchers who plan on presenting at a symposium on the topic scheduled for April 1, 2023.
Networks,
Operations and Life,
Privacy,
Security
Volume 21, Issue 1
Designing a Framework for Conversational Interfaces
Zachary Tellman
Combining the latest advances in machine learning with earlier approaches
Wherever possible, business logic should be described by code rather than training data. This keeps our system's behavior principled, predictable, and easy to change. Our approach to conversational interfaces allows them to be built much like any other application, using familiar tools, conventions, and processes, while still taking advantage of cutting-edge machine-learning techniques.
AI,
Data,
HCI
Kode Vicious:
The Parchment Path?
Is there ever a time when learning is not of value—for its own sake?
The greater the risk, the greater the reward, and if you do succeed, it will be an achievement that you can look back on and smile wryly about. Postdocs never laugh because postdocs are post-laughter. However, there are some things to consider before plunking down your application fee and writing all those essays.
Education,
Kode Vicious
Opportunity Cost and Missed Chances in Optimizing Cybersecurity
Kelly Shortridge, Josiah Dykstra
The loss of potential gain from other alternatives when one alternative is chosen
Opportunity cost should not be an afterthought when making security decisions. One way to ease into considering complex alternatives is to consider the null baseline of doing nothing instead of the choice at hand. Opportunity cost can feel abstract, elusive, and imprecise, but it can be understood by everyone, given the right introduction and framing. Using the approach presented here will make it natural and accessible.
Networks,
Security
Drill Bits
Catch-23: The New C Standard
Sets the World on Fire
Terence Kelly with Special Guest Borer Yekai Pan
A new major revision of the C programming language standard is nearly upon us. C23 introduces pleasant conveniences, retains venerable traps for the unwary, and innovates a gratuitous catastrophe. A few steps forward, much sideways shuffling, and a drunken backward stumble into the fireplace come together in the official dance of C standardization, the Whiskey Tango Foxtrot.
Drill Bits,
Code,
Development
Sharpening Your Tools
Simson Garfinkel, Jon Stewart
Updating bulk_extractor
for the 2020s
This article presents our experience updating the high-performance Digital forensics tool BE (bulk_extractor) a decade after its initial release. Between 2018 and 2022, we updated the program from C++98 to C++17. We also performed a complete code refactoring and adopted a unit test framework. DF tools must be frequently updated to keep up with changes in the ways they are used. A description of updates to the bulk_extractor tool serves as an example of what can and should be done.
Code
Testing,
Tools,
Case Study
More Than Just Algorithms
A discussion with Alfred Spector, Peter Norvig, Chris Wiggins, Jeannette Wing, Ben Fried, and Michael Tingley
Dramatic advances in the ability to gather, store, and process data have led to the rapid growth of data science and its mushrooming impact on nearly all aspects of the economy and society. Data science has also had a huge effect on academic disciplines with new research agendas, new degrees, and organizational entities. The authors of a new textbook, Data Science in Context: Foundations, Challenges, Opportunities, share their ideas about the impact of the field on nearly all aspects of the economy and society.
Case studies,
Data
Volume 20, Issue 6
Kode Vicious:
All Sliders to the Right
Hardware overkill
There are many reasons why this year's model isn't any better than last year's, and many reasons why performance fails to scale, some of which KV has covered in these pages. It is true that the days of upgrading every year and getting a free performance boost are long gone, as we're not really getting single cores that are faster than about 4GHz. One thing that many software developers fail to understand is the hardware on which their software runs at a sufficiently deep level.
Hardware,
Kode Vicious,
Performance
Three-part Harmony for Program Managers
Who Just Don't Get It, Yet
Guenever Aldrich, Danny Tsang, Jason Mckenney
Open-source software, open standards, and agile software development
This article examines three tools in the system acquisitions toolbox that can work to expedite development and procurement while mitigating programmatic risk: OSS, open standards, and the Agile/Scrum software development processes are all powerful additions to the DoD acquisition program management toolbox.
Development,
Systems Administration
Research for Practice:
The Fun in Fuzzing
Stefan Nagy, with Introduction By Peter Alvaro
The debugging technique comes into its own.
Stefan Nagy, an assistant professor in the Kahlert School of Computing at the University of Utah, takes us on a tour of recent research in software fuzzing, or the systematic testing of programs via the generation of novel or unexpected inputs. The first paper he discusses extends the state of the art in coverage-guided fuzzing with the semantic notion of "likely invariants," inferred via techniques from property-based testing. The second explores encoding domain-specific knowledge about certain bug classes into test-case generation. His last selection takes us through the looking glass, randomly generating entire C programs and using differential analysis to compare traces of optimized and unoptimized executions, in order to find bugs in the compilers themselves.
Research for Practice,
Testing
To PiM or Not to PiM
Gabriel Falcao And João Dinis Ferreira
The case for in-memory inferencing of quantized CNNs at the edge
As artificial intelligence becomes a pervasive tool for the billions of IoT (Internet of things) devices at the edge, the data movement bottleneck imposes severe limitations on the performance and autonomy of these systems.
PiM (processing-in-memory) is emerging as a way of mitigating the data movement bottleneck while satisfying the stringent performance, energy efficiency, and accuracy requirements of edge imaging applications that rely on CNNs (convolutional neural networks).
AI,
Data,
Networks,
Performance
Taking Flight with Copilot
Christian Bird, Denae Ford, Thomas Zimmermann, Nicole Forsgren, Eirini Kalliamvakou, Travis Lowdermilk, Idan Gazit
Early insights and opportunities of AI-powered pair-programming tools
Over the next five years, AI-powered tools likely will be helping developers in many diverse tasks. For example, such models may be used to improve code review, directing reviewers to parts of a change where review is most needed or even directly providing feedback on changes. Models such as Codex may suggest fixes for defects in code, build failures, or failing tests. These models are able to write tests automatically, helping to improve code quality and downstream reliability of distributed systems. This study of Copilot shows that developers spend more time reviewing code than actually writing code. As AI-powered tools are integrated into more software development tasks, developer roles will shift so that more time is spent assessing suggestions related to the task than doing the task itself.
AI,
Development
Volume 20, Issue 5
Reinventing Backend Subsetting at Google
Peter Ward and Paul Wankadia with Kavita Guliani
Designing an algorithm with reduced connection churn that could replace deterministic subsetting
Backend subsetting is useful for reducing costs and may even be necessary for operating within the system limits. For more than a decade, Google used deterministic subsetting as its default backend subsetting algorithm, but although this algorithm balances the number of connections per backend task, deterministic subsetting has a high level of connection churn. Our goal at Google was to design an algorithm with reduced connection churn that could replace deterministic subsetting as the default backend subsetting algorithm.
Performance,
Testing
Kode Vicious:
The Elephant in the Room
It's time to get the POSIX elephant off our necks.
By writing code for the elephant that is Posix, we lose the chance to take advantage of modern hardware.
Development,
Kode Vicious
OCCAM-v2: Combining Static and Dynamic Analysis for Effective and Efficient Whole-program Specialization
Jorge A. Navas and Ashish Gehani
Leveraging scalable pointer analysis, value analysis, and dynamic analysis
OCCAM-v2 leverages scalable pointer analysis, value analysis, and dynamic analysis to create an effective and efficient tool for specializing LLVM bitcode. The extent of the code-size reduction achieved depends on the specific deployment configuration. Each application that is to be specialized is accompanied by a manifest that specifies concrete arguments that are known a priori, as well as a count of residual arguments that will be provided at runtime. The best case for partial evaluation occurs when the arguments are completely concretely specified. OCCAM-v2 uses a pointer analysis to devirtualize calls, allowing it to eliminate the entire body of functions that are not reachable by any direct calls. The hybrid analysis feature can handle cases that are challenging for static analysis, such as input loops, string processing, and external data (in files, for example). On the suite of evaluated programs, OCCAM-v2 was able to reduce the instruction count by 40.6 percent on average, taking a median of 2.4 seconds.
Development,
Quality Assurance,
Testing,
Tools
Case Study
OSS Supply-chain Security:
What Will It Take?
A discussion with Maya Kaczorowski, Falcon Momot, George Neville-Neil, and Chris McCubbin
While enterprise security teams naturally tend to turn their focus primarily to direct attacks on their own infrastructure, cybercrime exploits now are increasingly aimed at easier targets upstream. This has led to a perfect storm, since virtually all significant codebase repositories at this point include at least some amount of open-source software. But opportunities also abound there for the authors of malware. The broader cybercrime world, meanwhile, has noted that open-source supply chains are generally easy to penetrate. What's being done at this point to address the apparent risks?
Case studies,
Open Source,
Security
Drill Bits
Literate Executables
Terence Kelly
Literate executables redefine the relationship between compiled binaries and source code to be that of chicken and egg, so it's easy to derive either from the other. This episode of Drill Bits provides a general-purpose literacy tool and showcases the advantages of literacy by retrofitting it onto everyone's favorite command-line utility.
Drill Bits,
Code,
Data,
Development
Operations and Life:
Split Your Overwhelmed Teams
Thomas A. Limoncelli
Two teams of five is not the same as one team of ten.
This team's low morale and high stress were a result of the members feeling overwhelmed by too many responsibilities. The 10-by-10 communication structure made it difficult to achieve consensus, there were too many meetings, and everyone was suffering from the high cognitive load. By splitting into two teams, each can be more nimble, which the manager likes, and have a lower cognitive load, which the team likes. There is more opportunity for repetition, which lets people develop skills and demonstrate them. Altogether, this helps reduce stress and improve morale.
Business and Management,
Operations and Life