Thanumalayan Sankaranarayana Pillai,
Andrea C. Arpaci-Dusseau,
Remzi H. Arpaci-Dusseau
Rethinking the Fundamental Abstractions of the File System
The reading and writing of data, one of the most fundamental aspects of any Von Neumann computer, is surprisingly subtle and full of nuance. For example, consider access to a shared memory in a system with multiple processors. While a simple and intuitive approach known as strong consistency is easiest for programmers to understand, many weaker models are in widespread use (e.g., x86 total store ordering); such approaches improve system performance, but at the cost of making reasoning about system behavior more complex and error-prone. Fortunately, a great deal of time and effort has gone into thinking about such memory models, and, as a result, most multiprocessor applications are not caught unaware.
File Systems and Storage
Testing a Distributed System
Testing a distributed system can be trying even under the best of circumstances.
Distributed systems can be especially difficult to program, for a variety of reasons. They can be difficult to design, difficult to manage, and, above all, difficult to test. Testing a normal system can be trying even under the best of circumstances, and no matter how diligent the tester is, bugs can still get through. Now take all of the standard issues and multiply them by multiple processes written in multiple languages running on multiple boxes that could potentially all be on different operating systems, and there is potential for a real disaster.
Natural Language Translation at the Intersection of AI and HCI
Spence Green, Jeffrey Heer, and Christopher D. Manning
Old questions being answered with both AI and HCI
The fields of artificial intelligence (AI) and human-computer interaction (HCI) are influencing each other like never before. Widely used systems such as Google Translate, Facebook Graph Search, and RelateIQ hide the complexity of large-scale AI systems behind intuitive interfaces. But relations were not always so auspicious. The two fields emerged at different points in the history of computer science, with different influences, ambitions, and attendant biases. AI aimed to construct a rival, and perhaps a successor, to the human intellect. Early AI researchers such as McCarthy, Minsky, and Shannon were mathematicians by training, so theorem-proving and formal models were attractive research directions. In contrast, HCI focused more on empirical approaches to usability and human factors, both of which generally aim to make machines more useful to humans. Many of the attendees at the first CHI conference in 1983 were psychologists and engineers. Papers were presented with titles such as "Design principles for human-computer interfaces" and "Psychological issues in the use of icons in command menus," hardly appealing fare for most mainstream AI researchers.
Beyond Page Objects: Testing Web Applications with State Objects
Arie van Deursen
Use states to drive your tests
End-to-end testing of Web applications typically involves tricky interactions with Web pages by means of a framework such as Selenium WebDriver. The recommended method for hiding such Web-page intricacies is to use page objects, but there are questions to answer first: Which page objects should you create when testing Web applications? What actions should you include in a page object? Which test scenarios should you specify, given your page objects?
While working with page objects during the past few months to test an AngularJS Web application, I answered these questions by moving page objects to the state level. Viewing the Web application as a state chart made it much easier to design test scenarios and corresponding page objects. This article describes the approach that gradually emerged: essentially a state-based generalization of page objects, referred to here as state objects.
Kode Vicious: Hickory Dickory Doc
On null encryption and automated documentation
While reviewing some encryption code in our product, I came across an option that allowed for null encryption. This means the encryption could be turned on, but the data would never be encrypted or decrypted. It would always be stored "in the clear." I removed the option from our latest source tree because I figured we didn't want an unsuspecting user to turn on encryption but still have data stored in the clear. One of the other programmers on my team reviewed the potential change and blocked me from committing it, saying that the null code could be used for testing. I disagreed with her, since I think that the risk of accidentally using the code is more important than a simple test. Which of us is right?
Dismantling the Barriers to Entry
We have to choose to build a web that is accessible to everyone.
A war is being waged in the world of web development. On one side is a vanguard of toolmakers and tool users, who thrive on the destruction of bad old ideas ("old," in this milieu, meaning anything that debuted on Hacker News more than a month ago) and raucous debates about transpilers and suchlike.
On the other side is an increasingly vocal contingent of developers who claim that the head-spinning rate of innovation makes it impossible to stay up to date, and that the web is disintegrating into a jumble of hacks upon opinions, most of which are wrong, and all of which will have changed by the time hot-new-thing.js reaches version 1.0.0.
Kode Vicious: Lazarus Code
No one expects the Spanish Acquisition.
Dear KV, I've been asked to look into the possibility of taking a 15-year-old piece of open-source software and updating it to work on a current system used by my company. The code itself doesn't seem to be too bad, at least no worse than the code I'm used to reading, but I suspect it might be easier to write a new version from scratch than to try to understand code that I didn't write and which no one has actively maintained for several years. What is the point at which I should decide to ignore this old code and write something new?
Hadoop Superlinear Scalability
Neil Gunther, Paul Puglia, Kristofer Tomasette
The perpetual motion of parallel performance
"We often see more than 100 percent speedup efficiency!" came the rejoinder to the innocent reminder that you can't have more than 100 percent of anything. But this was just the first volley from software engineers during a presentation on how to quantify computer system scalability in terms of the speedup metric. In different venues, on subsequent occasions, that retort seemed to grow into a veritable chorus that not only was superlinear speedup commonly observed, but also the model used to quantify scalability for the past 20 years—the USL (Universal Scalability Law)—failed when applied to superlinear speedup data.
Indeed, superlinear speedup is a bona fide, measurable phenomenon that can be expected to appear more frequently in practice as new applications are deployed onto distributed architectures. As demonstrated here using Hadoop MapReduce, however, USL is not only capable of accommodating superlinear speedup in a surprisingly simple way, it reveals that superlinearity, although alluring, is as illusory as perpetual motion.
Evolution and Practice: Low-latency Distributed Applications in Finance
The finance industry has unique demands for low-latency distributed systems.
Virtually all systems have some requirements for latency. Latency requirements appear in problem domains as diverse as aircraft flight controls, voice communications, multiplayer gaming, online advertising, and scientific experiments. Distributed systems present special latency considerations. In recent years the automation of financial trading has driven requirements for distributed systems with challenging latency requirements and global geographic distribution. Automated trading provides a window into the engineering challenges of ever-shrinking latency requirements, which may be useful to software engineers in other fields.
The Science of Managing Data Science
Lessons learned managing a data science research team
"What are they doing all day?" When I first took over as VP of Engineering at a startup doing data mining and machine learning research, this was what the other executives wanted to know. They knew the team was super smart, and they seemed like they were working really hard, but the executives had lots of questions about the work itself. How did they know that the work they were doing was the "right" work? Were there other projects they could be doing instead? And how could we get this research into the hands of our customers faster?
Using Free and Open Source Tools to Manage Software Quality
Phelim Dowling and Kevin McGrath
An agile process implementation
The principles of agile software development place more emphasis on individuals and interactions than on processes and tools. They steer us away from heavy documentation requirements and guide us along a path of reacting efficiently to change rather than sticking rigidly to a pre-defined plan. To support this flexible method of operation, it is important to have suitable applications to manage the team's activities. It is also essential to implement effective frameworks to ensure quality is being built into the product early and at all levels. With these concerns in mind and coming from a budget-conscious perspective, this article will explore the free and open source applications and tools used by one organization in its quest to build process and quality around its projects and products.
From the EDVAC to WEBVACs
Daniel C. Wang
Cloud computing for computer scientists
By now everyone has heard of cloud computing and realized that it is changing how both traditional enterprise IT and emerging startups are building solutions for the future. Is this trend toward the cloud just a shift in the complicated economics of the hardware and software industry, or is it a fundamentally different way of thinking about computing? Having worked in the industry, I can confidently say it is both.
Spicing Up Dart with Side Effects
Erik Meijer, Applied Duality; Kevin Millikin, Google; Gilad Bracha, Google
A set of extensions to the Dart programming language, designed to support asynchrony and generator functions
The Dart programming language has recently incorporated a set of extensions designed to support asynchrony and generator functions. Because Dart is a language for Web programming, latency is an important concern. To avoid blocking, developers must make methods asynchronous when computing their results requires nontrivial time. Generator functions ease the task of computing iterable sequences.
Reliable Cron across the Planet
Štěpán Davidovič, Kavita Guliani, Google
...or How I stopped worrying and learned to love time
This article describes Google's implementation of a distributed Cron service, serving the vast majority of internal teams that need periodic scheduling of compute jobs. During its existence, we have learned many lessons on how to design and implement what might seem like a basic service. Here, we discuss the problems that distributed Crons face and outline some potential solutions.
There is No Now
Problems with simultaneity in distributed systems
"Now." The time elapsed between when I wrote that word and when you read it was at least a couple of weeks. That kind of delay is one that we take for granted and don't even think about in written media. "Now." If we were in the same room and instead I spoke aloud, you might have a greater sense of immediacy. You might intuitively feel as if you were hearing the word at exactly the same time that I spoke it. That intuition would be wrong. If, instead of trusting your intuition, you thought about the physics of sound, you would know that time must have elapsed between my speaking and your hearing. The motion of the air, carrying my word, would take time to get from my mouth to your ear.
Parallel Processing with Promises
A simple method of writing a collaborative system
In today's world, there are many reasons to write concurrent software. The desire to improve performance and increase throughput has led to many different asynchronous techniques. The techniques involved, however, are generally complex and the source of many subtle bugs, especially if they require shared mutable state. If shared state is not required, then these problems can be solved with a better abstraction called promises. These allow programmers to hook asynchronous function calls together, waiting for each to return success or failure before running the next appropriate function in the chain.
Relevance and repeatability
Dear KV, The company I work for has decided to use a wireless network link to reduce latency, at least when the weather between the stations is good. It seems to me that for transmission over lossy wireless links we'll want our own transport protocol that sits directly on top of whatever the radio provides, instead of wasting bits on IP and TCP or UDP headers, which, for a point-to-point network, aren't really useful.
Go Static or Go Home
In the end, dynamic systems are simply less secure.
Most current and historic problems in computer and network security boil down to a single observation: letting other people control our devices is bad for us. At another time, I'll explain what I mean by "other people" and "bad." For the purpose of this article, I'll focus entirely on what I mean by control. One way we lose control of our devices is to external distributed denial of service (DDoS) attacks, which fill a network with unwanted traffic, leaving no room for real ("wanted") traffic. Other forms of DDoS are similar—an attack by the Low Orbit Ion Cannon (LOIC), for example, might not totally fill up a network, but it can keep a web server so busy answering useless attack requests that the server can't answer any useful customer requests. Either way, DDoS means outsiders are controlling our devices, and that's bad for us.
HTTP/2.0 - The IETF is Phoning It In
Bad protocol, bad politics
In the long run, the most memorable event of 1989 will probably be that Tim Berners-Lee hacked up the HTTP protocol and named the result the "World Wide Web." Tim's HTTP protocol ran on 10Mbit/s, Ethernet, and coax cables, and his computer was a NeXT Cube with a 25-MHz clock frequency. Twenty-six years later, my laptop CPU is a hundred times faster and has a thousand times as much RAM as Tim's machine had, but the HTTP protocol is still the same. A few days ago the IESG, The Internet Engineering Steering Group, asked for "Last Call" comments on new "HTTP/2.0" protocol before blessing it as a "Proposed Standard".
META II: Digital Vellum in the Digital Scriptorium
Revisiting Schorre's 1962 compiler-compiler
Some people do living history—reviving older skills and material culture by reenacting Waterloo or knapping flint knives. One pleasant rainy weekend in 2012, I set my sights a little more recently and settled in for a little meditative retro-computing, ca. 1962, following the ancient mode of transmission of knowledge: lecture and recitation—or rather, grace of living in historical times, lecture (here, in the French sense, reading) and transcription (or even more specifically, grace of living post-Post, lecture and reimplementation). Fortunately, for my purposes, Dewey Val Schorre's paper on META II was, unlike many more recent digital artifacts, readily available as a digital scan.
Model-based Testing: Where Does It Stand?
Robert V. Binder, Bruno Legeard, and Anne Kramer
MBT has positive effects on efficiency and effectiveness, even if it only partially fulfills high expectations.
From mid-June 2014 to early August 2014, we conducted a survey to learn how MBT users view its efficiency and effectiveness. The 2014 MBT User Survey, a follow-up to a similar 2012 survey (http://robertvbinder.com/real-users-of-model-based-testing/), was open to all those who have evaluated or used any MBT approach. Its 32 questions included some from a survey distributed at the 2013 User Conference on Advanced Automated Testing. Some questions focused on the efficiency and effectiveness of MBT, providing the figures that managers are most interested in. Other questions were more technical and sought to validate a common MBT classification scheme. A common classification scheme could help users understand both the general diversity and specific approaches.
Securing the Network Time Protocol
Crackers discover how to use NTP as a weapon for abuse.
In the late 1970s David L. Mills began working on the problem of synchronizing time on networked computers, and NTP (Network Time Protocol) version 1 made its debut in 1980. This was at a time when the net was a much friendlier place—the ARPANET days. NTP version 2 appeared approximately a year later, about the same time as CSNET (Computer Science Network). NSFNET (National Science Foundation Network) launched in 1986. NTP version 3 showed up in 1993.
Scalability Techniques for Practical Synchronization Primitives
Designing locking primitives with performance in mind
In an ideal world, applications are expected to scale automatically when executed on increasingly larger systems. In practice, however, not only does this scaling not occur, but it is common to see performance actually worsen on those larger systems.
Internal Access Controls
Trust But Verify.
Every day seems to bring news of another dramatic and high-profile security incident, whether it is the discovery of longstanding vulnerabilities in widely used software such as OpenSSL or Bash, or celebrity photographs stolen and publicized. There seems to be an infinite supply of zero-day vulnerabilities and powerful state-sponsored attackers. In the face of such threats, is it even worth trying to protect your systems and data? What can systems security designers and administrators do?
Use the database built for your access model.
The topic of data storage is one that doesn't need to be well understood until something goes wrong (data disappears) or something goes really right (too many customers). Because databases can be treated as black boxes with an API, their inner workings are often overlooked. They're often treated as magic things that just take data when offered and supply it when asked. Since these two operations are the only understood activities of the technology, they are often the only features presented when comparing different technologies.
Kode Vicious: Too Big to Fail
Visibility leads to debuggability.
Our project has been rolling out a well-known, distributed key/value store onto our infrastructure, and we've been surprised—more than once—when a simple increase in the number of clients has not only slowed things, but brought them to a complete halt. This then results in rollback while several of us scour the online forums to figure out if anyone else has seen the same problem. The entire reason for using this project's software is to increase the scale of a large system, so I have been surprised at how many times a small increase in load has led to a complete failure. Is there something about scaling systems that's so difficult that these systems become fragile, even at a modest scale?