Sort By:

Scaling Synchronization in Multicore Programs:
Advanced synchronization methods can boost the performance of multicore software.

Designing software for modern multicore processors poses a dilemma. Traditional software designs, in which threads manipulate shared data, have limited scalability because synchronization of updates to shared data serializes threads and limits parallelism. Alternative distributed software designs, in which threads do not share mutable data, eliminate synchronization and offer better scalability. But distributed designs make it challenging to implement features that shared data structures naturally provide, such as dynamic load balancing and strong consistency guarantees, and are simply not a good fit for every program. Often, however, the performance of shared mutable data structures is limited by the synchronization methods in use today, whether lock-based or lock-free.

by Adam Morrison | August 23, 2016


Challenges of Memory Management on Modern NUMA System:
Optimizing NUMA systems applications with Carrefour

Modern server-class systems are typically built as several multicore chips put together in a single system. Each chip has a local DRAM (dynamic random-access memory) module; together they are referred to as a node. Nodes are connected via a high-speed interconnect, and the system is fully coherent. This means that, transparently to the programmer, a core can issue requests to its node’s local memory as well as to the memories of other nodes. The key distinction is that remote requests will take longer, because they are subject to longer wire delays and may have to jump several hops as they traverse the interconnect.

by Fabien Gaud, Baptiste Lepers, Justin Funston, Mohammad Dashti, Alexandra Fedorova, Vivien Quéma, Renaud Lachaize, Mark Roth | December 1, 2015


Parallel Processing with Promises:
A simple method of writing a collaborative system

In today’s world, there are many reasons to write concurrent software. The desire to improve performance and increase throughput has led to many different asynchronous techniques. The techniques involved, however, are generally complex and the source of many subtle bugs, especially if they require shared mutable state. If shared state is not required, then these problems can be solved with a better abstraction called promises. These allow programmers to hook asynchronous function calls together, waiting for each to return success or failure before running the next appropriate function in the chain.

by Spencer Rathbun | March 3, 2015


Scalability Techniques for Practical Synchronization Primitives:
Designing locking primitives with performance in mind

In an ideal world, applications are expected to scale automatically when executed on increasingly larger systems. In practice, however, not only does this scaling not occur, but it is common to see performance actually worsen on those larger systems.

by Davidlohr Bueso | December 14, 2014


Productivity in Parallel Programming: A Decade of Progress:
Looking at the design and benefits of X10

In 2002 DARPA (Defense Advanced Research Projects Agency) launched a major initiative in HPCS (high-productivity computing systems). The program was motivated by the belief that the utilization of the coming generation of parallel machines was gated by the difficulty of writing, debugging, tuning, and maintaining software at peta scale.

by John T. Richards, Jonathan Brezin, Calvin B. Swart, Christine A. Halverson | October 20, 2014


Scaling Existing Lock-based Applications with Lock Elision:
Lock elision enables existing lock-based programs to achieve the performance benefits of nonblocking synchronization and fine-grain locking with minor software engineering effort.

Multithreaded applications take advantage of increasing core counts to achieve high performance. Such programs, however, typically require programmers to reason about data shared among multiple threads. Programmers use synchronization mechanisms such as mutual-exclusion locks to ensure correct updates to shared data in the presence of accesses from multiple threads. Unfortunately, these mechanisms serialize thread accesses to the data and limit scalability.

by Andi Kleen | February 8, 2014


The Balancing Act of Choosing Nonblocking Features:
Design requirements of nonblocking systems

What is nonblocking progress? Consider the simple example of incrementing a counter C shared among multiple threads. One way to do so is by protecting the steps of incrementing C by a mutual exclusion lock L (i.e., acquire(L); old := C ; C := old+1; release(L);). If a thread P is holding L, then a different thread Q must wait for P to release L before Q can proceed to operate on C. That is, Q is blocked by P.

by Maged M. Michael | August 12, 2013


Nonblocking Algorithms and Scalable Multicore Programming:
Exploring some alternatives to lock-based synchronization

Real-world systems with complicated quality-of-service guarantees may require a delicate balance between throughput and latency to meet operating requirements in a cost-efficient manner. The increasing availability and decreasing cost of commodity multicore and many-core systems make concurrency and parallelism increasingly necessary for meeting demanding performance requirements. Unfortunately, the design and implementation of correct, efficient, and scalable concurrent software is often a daunting task.

by Samy Al Bahra | June 11, 2013


Proving the Correctness of Nonblocking Data Structures:
So you’ve decided to use a nonblocking data structure, and now you need to be certain of its correctness. How can this be achieved?

Nonblocking synchronization can yield astonishing results in terms of scalability and realtime response, but at the expense of verification state space.

by Mathieu Desnoyers | June 2, 2013


Structured Deferral: Synchronization via Procrastination:
We simply do not have a synchronization mechanism that can enforce mutual exclusion.

Developers often take a proactive approach to software design, especially those from cultures valuing industriousness over procrastination. Lazy approaches, however, have proven their value, with examples including reference counting, garbage collection, and lazy evaluation. This structured deferral takes the form of synchronization via procrastination, specifically reference counting, hazard pointers, and RCU (read-copy-update).

by Paul E. McKenney | May 23, 2013


Software Transactional Memory: Why Is It Only a Research Toy?:
The promise of STM may likely be undermined by its overheads and workload applicabilities.

TM (transactional memory) is a concurrency control paradigm that provides atomic and isolated execution for regions of code. TM is considered by many researchers to be one of the most promising solutions to address the problem of programming multicore processors. Its most appealing feature is that most programmers only need to reason locally about shared data accesses, mark the code region to be executed transactionally, and let the underlying system ensure the correct concurrent execution. This model promises to provide the scalability of fine-grain locking, while avoiding common pitfalls of lock composition such as deadlock.

by Calin Cascaval, Colin Blundell, Maged Michael, Harold W. Cain, Peng Wu, Stefanie Chiras, Siddhartha Chatterjee | October 24, 2008


Parallel Programming with Transactional Memory:
While sometimes even writing regular, single-threaded programs can be quite challenging, trying to split a program into multiple pieces that can be executed in parallel adds a whole dimension of additional problems. Drawing upon the transaction concept familiar to most programmers, transactional memory was designed to solve some of these problems and make parallel programming easier. Ulrich Drepper from Red Hat shows us how it’s done.

With the speed of individual cores no longer increasing at the rate we came to love over the past decades, programmers have to look for other ways to increase the speed of our ever-more-complicated applications. The functionality provided by the CPU manufacturers is an increased number of execution units, or CPU cores.

by Ulrich Drepper | October 24, 2008


Erlang for Concurrent Programming:
What role can programming languages play in dealing with concurrency? One answer can be found in Erlang, a language designed for concurrency from the ground up.

Erlang is a language developed to let mere mortals write, test, deploy, and debug fault-tolerant concurrent software. Developed at the Swedish telecom company Ericsson in the late 1980s, it started as a platform for developing soft realtime software for managing phone switches. It has since been open-sourced and ported to several common platforms, finding a natural fit not only in distributed Internet server applications, but also in graphical user interfaces and ordinary batch applications.

by Jim Larson | October 24, 2008


Real-World Concurrency:
In this look at how concurrency affects practitioners in the real world, Cantrill and Bonwick argue that much of the anxiety over concurrency is unwarranted.

Software practitioners today could be forgiven if recent microprocessor developments have given them some trepidation about the future of software. While Moore’s law continues to hold (that is, transistor density continues to double roughly every 18 months), as a result of both intractable physical limitations and practical engineering considerations, that increasing density is no longer being spent on boosting clock rate. Instead, it is being used to put multiple CPU cores on a single CPU die.

by Bryan Cantrill, Jeff Bonwick | October 24, 2008


Unlocking Concurrency:
Multicore programming with transactional memory

Multicore architectures are an inflection point in mainstream software development because they force developers to write parallel programs. In a previous article in Queue, Herb Sutter and James Larus pointed out, “The concurrency revolution is primarily a software revolution. The difficult problem is not building multicore hardware, but programming it in a way that lets mainstream applications benefit from the continued exponential growth in CPU performance.” In this new multicore world, developers must write explicitly parallel applications that can take advantage of the increasing number of cores that each successive multicore generation will provide.

by Ali-Reza Adl-Tabatabai, Christos Kozyrakis, Bratin Saha | December 28, 2006


Threads without the Pain:
Multithreaded programming need not be so angst-ridden.

Much of today’s software deals with multiple concurrent tasks. Web browsers support multiple concurrent HTTP connections, graphical user interfaces deal with multiple windows and input devices, and Web and DNS servers handle concurrent connections or transactions from large numbers of clients. The number of concurrent tasks that needs to be handled increases while software grows more complex. Structuring concurrent software in a way that meets the increasing scalability requirements while remaining simple, structured, and safe enough to allow mortal programmers to construct ever-more complex systems is a major engineering challenge.

by Andreas Gustafsson | December 16, 2005


Software and the Concurrency Revolution:
Leveraging the full power of multicore processors demands new tools and new thinking from the software industry.

Concurrency has long been touted as the "next big thing" and "the way of the future," but for the past 30 years, mainstream software development has been able to ignore it. Our parallel future has finally arrived: new machines will be parallel machines, and this will require major changes in the way we develop software. The introductory article in this issue describes the hardware imperatives behind this shift in computer architecture from uniprocessors to multicore processors, also known as CMPs.

by Herb Sutter, James Larus | October 18, 2005


Trials and Tribulations of Debugging Concurrency:
You can run, but you can’t hide.

We now sit firmly in the 21st century where the grand challenge to the modern-day programmer is neither memory leaks nor type issues (both of those problems are now effectively solved), but rather issues of concurrency. How does one write increasingly complex programs where concurrency is a first-class concern. Or even more treacherous, how does one debug such a beast? These questions bring fear into the hearts of even the best programmers.

by Kang Su Gatlin | November 30, 2004