GPU Computing

Vol. 6 No. 2 – March/April 2008

GPU Computing

Kode Vicious

Latency and Livelocks

"Dear KV: My company has a very large database with all of our customer information. The database is replicated to several locations around the world to improve performance locally, so that when customers in Asia want to look at their data, they don't have to wait for it to come from the United States, where my company is based..."

Latency and Livelocks

A koder with attitude, KV answers your questions. Miss Manners he ain't.

Sometimes data just doesn't travel as fast as it should. Sometimes a program appears to be running fine, but is quietly failing behind the scenes. If you've experienced these problems, you may have struggled for a while and then become baffled and/or tired. Kode Vicious knows your frustration, and this month serves up some instructive words on how to deal with both of these annoying problems. Fatigued or mystified by other quandaries? E-mail your problem to KV@acmqueue.com.

Dear KV,

by George Neville-Neil

Interviews

A Conversation with Kurt Akeley and Pat Hanrahan

Interviewing either Kurt Akeley or Pat Hanrahan for this month's special report on GPUs would have been a great opportunity, so needless to say we were delighted when both of these graphics-programming veterans agreed to participate.

A Conversation with Kurt Akeley and Pat Hanrahan

Graphics veterans debate the evolution of the GPU

Interviewing either Kurt Akeley or Pat Hanrahan for this months special report on GPUs would have been a great opportunity, so needless to say we were delighted when both of these graphics-programming veterans agreed to participate.

Akeley was part of the founding Silicon Graphics team in 1982 and worked there for almost 20 years, during which he led the development of several high-end graphics systems, including GTX, VGX, and RealityEngine. Hes also known for his pioneering work on OpenGL, the industry-standard programming interface for high-performance graphics hardware. Akeley is now a principal researcher at Microsoft Research Silicon Valley, where he works on cutting-edge projects in graphics system architecture, high-performance computing, and display design.

Articles

Data-Parallel Computing

Users always care about performance. Although often it's just a matter of making sure the software is doing only what it should, there are many cases where it is vital to get down to the metal and leverage the fundamental characteristics of the processor.

Data-Parallel Computing

Data parallelism is a key concept in leveraging the power of todays manycore GPUs.

CHAS. BOYD, MICROSOFT

Users always care about performance.

Although often its just a matter of making sure the software is doing only what it should, there are many cases where it is vital to get down to the metal and leverage the fundamental characteristics of the processor.

by Chas. Boyd

Future Graphics Architectures

Graphics architectures are in the midst of a major transition. In the past, these were specialized architectures designed to support a single rendering algorithm: the standard Z buffer. Realtime 3D graphics has now advanced to the point where the Z-buffer algorithm has serious shortcomings for generating the next generation of higher-quality visual effects demanded by games and other interactive 3D applications. There is also a desire to use the high computational capability of graphics architectures to support collision detection, approximate physics simulations, scene management, and simple artificial intelligence. In response to these forces, graphics architectures are evolving toward a general-purpose parallel-programming model that will support a variety of image-synthesis algorithms, as well as nongraphics tasks.

FUTURE GRAPHICS ARCHITECTURES

GPUs continue to evolve rapidly, but toward what?

WILLIAM MARK, INTEL AND UNIVERSITY OF TEXAS, AUSTIN

Graphics architectures are in the midst of a major transition. In the past, these were specialized architectures designed to support a single rendering algorithm: the standard Z buffer. Realtime 3D graphics has now advanced to the point where the Z-buffer algorithm has serious shortcomings for generating the next generation of higher-quality visual effects demanded by games and other interactive 3D applications. There is also a desire to use the high computational capability of graphics architectures to support collision detection, approximate physics simulations, scene management, and simple artificial intelligence. In response to these forces, graphics architectures are evolving toward a general-purpose parallel-programming model that will support a variety of image-synthesis algorithms, as well as nongraphics tasks.

This architectural transformation presents both opportunities and challenges. For hardware designers, the primary challenge is to balance the demand for greater programmability with the need to continue delivering high performance on traditional image-synthesis algorithms. Software developers have an opportunity to escape from the constraints of hardware-dictated image-synthesis algorithms so that almost any desired algorithm can be implemented, even those that have nothing to do with graphics. With this opportunity, however, comes the challenge of writing efficient, high-performance parallel software to run on the new graphics architectures. Writing such software is substantially more difficult than writing the single-threaded software that most developers are accustomed to, and it requires that programmers address challenges such as algorithm parallelization, load balancing, synchronization, and management of data locality.

by William Mark

GPUs: A Closer Look

A gamer wanders through a virtual world rendered in near- cinematic detail. Seconds later, the screen fills with a 3D explosion, the result of unseen enemies hiding in physically accurate shadows. Disappointed, the user exits the game and returns to a computer desktop that exhibits the stylish 3D look-and-feel of a modern window manager. Both of these visual experiences require hundreds of gigaflops of computing performance, a demand met by the GPU (graphics processing unit) present in every consumer PC.

GPUs a closer look

As the line between GPUs and CPUs begins to blur, its important to understand what makes GPUs tick.

KAYVON FATAHALIAN and MIKE HOUSTON, STANFORD UNIVERSITY

A gamer wanders through a virtual world rendered in near- cinematic detail. Seconds later, the screen fills with a 3D explosion, the result of unseen enemies hiding in physically accurate shadows. Disappointed, the user exits the game and returns to a computer desktop that exhibits the stylish 3D look-and-feel of a modern window manager. Both of these visual experiences require hundreds of gigaflops of computing performance, a demand met by the GPU (graphics processing unit) present in every consumer PC.

The modern GPU is a versatile processor that constitutes an extreme but compelling point in the growing space of multicore parallel computing architectures. These platforms, which include GPUs, the STI Cell Broadband Engine, the Sun UltraSPARC T2, and, increasingly, multicore x86 systems from Intel and AMD, differentiate themselves from traditional CPU designs by prioritizing high-throughput processing of many parallel operations over the low-latency execution of a single task.

by Kayvon Fatahalian, Mike Houston

Scalable Parallel Programming with CUDA

The advent of multicore CPUs and manycore GPUs means that mainstream processor chips are now parallel systems. Furthermore, their parallelism continues to scale with Moore's law. The challenge is to develop mainstream application software that transparently scales its parallelism to leverage the increasing number of processor cores, much as 3D graphics applications transparently scale their parallelism to manycore GPUs with widely varying numbers of cores.

Scalable Parallel PROGRAMMING with CUDA

Is CUDA the parallel programming model that application developers have been waiting for?

JOHN NICKOLLS, IAN BUCK, AND MICHAEL GARLAND, NVIDIA, KEVIN SKADRON, UNIVERSITY OF VIRGINIA

The advent of multicore CPUs and manycore GPUs means that mainstream processor chips are now parallel systems. Furthermore, their parallelism continues to scale with Moores law. The challenge is to develop mainstream application software that transparently scales its parallelism to leverage the increasing number of processor cores, much as 3D graphics applications transparently scale their parallelism to manycore GPUs with widely varying numbers of cores.

According to conventional wisdom, parallel programming is difficult. Early experience with the CUDA1,2 scalable parallel programming model and C language, however, shows that many sophisticated programs can be readily expressed with a few easily understood abstractions. Since NVIDIA released CUDA in 2007, developers have rapidly developed scalable parallel programs for a wide range of applications, including computational chemistry, sparse matrix solvers, sorting, searching, and physics models. These applications scale transparently to hundreds of processor cores and thousands of concurrent threads. NVIDIA GPUs with the new Tesla unified graphics and computing architecture (described in the GPU sidebar) run CUDA C programs and are widely available in laptops, PCs, workstations, and servers. The CUDA model is also applicable to other shared-memory parallel processing architectures, including multicore CPUs.3

by John Nickolls, Ian Buck, Michael Garland, Kevin Skadron

Curmudgeon

Solomon's Sword Beats Occam's Razor

I've told you a googol times or more: Don't exaggerate! And, less often, I've ever-so-gently urged you not to understate. Why is my advice ignored? Why can't you get IT... just right, balanced beyond dispute? Lez Joosts Mildews, as my mam was fond of sayin, boxing both my ears with equal devotion. Follow the Middle Way as Tao did in his Middle Kingdom. Or "straight down the middle," as golfer Bing Crosby used to croon. His other golf song was "The Wearing of the Green," but such digressions run counter to my straight, plow-on-ahead advice. I've just smoked a cigarette branded Cleopatra, but that's none of your beeswax neither, and strictly between me and my Egyptian placements sponsor.

Solomons Sword Beats Occams Razor

Choosing your best hypothesis/

Stan Kelly-Bootle, Author

Ive told you a googol times or more: Dont exaggerate! And, less often, Ive ever-so-gently urged you not to understate.1 Why is my advice ignored? Why cant you get IT... just right, balanced beyond dispute? Lez Joosts Mildews, as my mam was fond of sayin,2 boxing both my ears with equal devotion. Follow the Middle Way as Tao did in his Middle Kingdom. Or straight down the middle, as golfer Bing Crosby used to croon. His other golf song was The Wearing of the Green, but such digressions run counter to my straight, plow-on-ahead advice. Ive just smoked a cigarette branded Cleopatra, but thats none of your beeswax neither, and strictly between me and my Egyptian placements sponsor.

So, shun deviations and avoid lifes bunkers lurking left and right. Our current presidential candidates excel in this craftiness, being both pro-Nafta and anti-Nafta as the local polls dictate. Yet, by one of those many quirks of natural language, politicians seeking compromises often find their reputations compromised./

by Stan Kelly-Bootle