Sort By:

FPGAs in Client Compute Hardware:
Despite certain challenges, FPGAs provide security and performance benefits over ASICs.

FPGAs (field-programmable gate arrays) are remarkably versatile. They are used in a wide variety of applications and industries where use of ASICs (application-specific integrated circuits) is less economically feasible. Despite the area, cost, and power challenges designers face when integrating FPGAs into devices, they provide significant security and performance benefits. Many of these benefits can be realized in client compute hardware such as laptops, tablets, and smartphones.

by Michael Mattioli | February 7, 2022


Chipping Away at Moore’s Law:
Modern CPUs are just chiplets connected together.

Smaller transistors can do more calculations without overheating, which makes them more power efficient. It also allows for smaller die sizes, which reduce costs and can increase density, allowing more cores per chip. The silicon wafers that chips are made of vary in purity, and none are perfect, which means every chip has a chance of having imperfections that differ in effect. Manufacturers can limit the effect of imperfections by using chiplets.

by Jessie Frazelle | March 13, 2020


NUMA (Non-Uniform Memory Access): An Overview:
NUMA becomes more common because memory controllers get close to execution units on microprocessors.

NUMA (non-uniform memory access) is the phenomenon that memory at various points in the address space of a processor have different performance characteristics. At current processor speeds, the signal path length from the processor to memory plays a significant role. Increased signal path length not only increases latency to memory but also quickly becomes a throughput bottleneck if the signal path is shared by multiple processors. The performance differences to memory were noticeable first on large-scale systems where data paths were spanning motherboards or chassis. These systems required modified operating-system kernels with NUMA support that explicitly understood the topological properties of the system’s memory (such as the chassis in which a region of memory was located) in order to avoid excessively long signal path lengths.

by Christoph Lameter | August 9, 2013


Realtime GPU Audio:
Finite difference-based sound synthesis using graphics processors

Today’s CPUs are capable of supporting realtime audio for many popular applications, but some compute-intensive audio applications require hardware acceleration. This article looks at some realtime sound-synthesis applications and shares the authors’ experiences implementing them on GPUs (graphics processing units).

by Bill Hsu, Marc Sosnick-Pérez | May 8, 2013


FPGA Programming for the Masses:
The programmability of FPGAs must improve if they are to be part of mainstream computing.

When looking at how hardware influences computing performance, we have GPPs (general-purpose processors) on one end of the spectrum and ASICs (application-specific integrated circuits) on the other. Processors are highly programmable but often inefficient in terms of power and performance. ASICs implement a dedicated and fixed function and provide the best power and performance characteristics, but any functional change requires a complete (and extremely expensive) re-spinning of the circuits.

by David Bacon, Rodric Rabbah, Sunil Shukla | February 23, 2013


CPU DB: Recording Microprocessor History:
With this open database, you can mine microprocessor trends over the past 40 years.

In November 1971, Intel introduced the world’s first single-chip microprocessor, the Intel 4004. It had 2,300 transistors, ran at a clock speed of up to 740 KHz, and delivered 60,000 instructions per second while dissipating 0.5 watts. The following four decades witnessed exponential growth in compute power, a trend that has enabled applications as diverse as climate modeling, protein folding, and computing real-time ballistic trajectories of angry birds.

by Andrew Danowitz, Kyle Kelley, James Mao, John P. Stevenson, Mark Horowitz | April 6, 2012


Managing Contention for Shared Resources on Multicore Processors:
Contention for caches, memory controllers, and interconnects can be alleviated by contention-aware scheduling algorithms.

Modern multicore systems are designed to allow clusters of cores to share various hardware structures, such as LLCs (last-level caches; for example, L2 or L3), memory controllers, and interconnects, as well as prefetching hardware. We refer to these resource-sharing clusters as memory domains, because the shared resources mostly have to do with the memory hierarchy.

by Alexandra Fedorova, Sergey Blagodurov, Sergey Zhuravlev | January 20, 2010


Reconfigurable Future:
The ability to produce cheaper, more compact chips is a double-edged sword.

Predicting the future is notoriously hard. Sometimes I feel that the only real guarantee is that the future will happen, and that someone will point out how it’s not like what was predicted. Nevertheless, we seem intent on trying to figure out what will happen, and worse yet, recording these views so they can be later used against us. So here I go... Scaling has been driving the whole electronics industry, allowing it to produce chips with more transistors at a lower cost.

by Mark Horowitz | July 14, 2008


Repurposing Consumer Hardware:
New uses for small form-factor, low-power machines

These days you have to be more and more creative when tackling home technology projects because the inventory of raw material is changing so rapidly. Market and product cycles continue to shrink, standard form factors are being discarded to drive down costs, and pricing is becoming more dependent on market value and less on direct manufacturing cost. As a result, standard modular building blocks are disappearing. New alternative uses for obsolete or low-price products are emerging, however.

by Mache Creeger | March 9, 2007


The Price of Performance:
An Economic Case for Chip Multiprocessing

In the late 1990s, our research group at DEC was one of a growing number of teams advocating the CMP (chip multiprocessor) as an alternative to highly complex single-threaded CPUs. We were designing the Piranha system,1 which was a radical point in the CMP design space in that we used very simple cores (similar to the early RISC designs of the late ’80s) to provide a higher level of thread-level parallelism. Our main goal was to achieve the best commercial workload performance for a given silicon budget. Today, in developing Google’s computing infrastructure, our focus is broader than performance alone. The merits of a particular architecture are measured by answering the following question: Are you able to afford the computational capacity you need?

by Luiz André Barroso | October 18, 2005


Extreme Software Scaling:
Chip multiprocessors have introduced a new dimension in scaling for application developers, operating system designers, and deployment specialists.

The advent of SMP (symmetric multiprocessing) added a new degree of scalability to computer systems. Rather than deriving additional performance from an incrementally faster microprocessor, an SMP system leverages multiple processors to obtain large gains in total system performance. Parallelism in software allows multiple jobs to execute concurrently on the system, increasing system throughput accordingly. Given sufficient software parallelism, these systems have proved to scale to several hundred processors.

by Richard McDougall | October 18, 2005


The Future of Microprocessors:
Chip multiprocessors’ promise of huge performance gains is now a reality.

The performance of microprocessors that power modern computers has continued to increase exponentially over the years for two main reasons. First, the transistors that are the heart of the circuits in all processors and memory chips have simply become faster over time on a course described by Moore’s law, and this directly affects the performance of processors built with those transistors. Moreover, actual processor performance has increased faster than Moore’s law would predict, because processor designers have been able to harness the increasing numbers of transistors available on modern chips to extract more parallelism from software.

by Kunle Olukotun, Lance Hammond | October 18, 2005


Digitally Assisted Analog Integrated Circuits:
Closing the gap between analog and digital

In past decades, “Moore’s law”1 has governed the revolution in microelectronics. Through continuous advancements in device and fabrication technology, the industry has maintained exponential progress rates in transistor miniaturization and integration density. As a result, microchips have become cheaper, faster, more complex, and more power efficient.

by Boris Murmann, Bernhard Boser | April 16, 2004


Getting Gigascale Chips:
Challenges and Opportunities in Continuing Moore’s Law

Processor performance has increased by five orders of magnitude in the last three decades, made possible by following Moore’s law - that is, continued technology scaling, improved transistor performance to increase frequency, additional (to avoid repetition) integration capacity to realize complex architectures, and reduced energy consumed per logic operation to keep power dissipation within limits. Advances in software technology, such as rich multimedia applications and runtime systems, exploited this performance explosion, delivering to end users higher productivity, seamless Internet connectivity, and even multimedia and entertainment.

by Shekhar Borkar | December 5, 2003