January/February 2018 issue of acmqueue

The January/February issue of acmqueue is out now

Computer Architecture

  Download PDF version of this article PDF

ITEM not available


Originally published in Queue vol. 9, no. 6
see this item in the ACM Digital Library



Mohamed Zahran - Heterogeneous Computing: Here to Stay
Hardware and Software Perspectives

David Chisnall - There's No Such Thing as a General-purpose Processor
And the belief in such a device is harmful

Hans-J Boehm, Sarita V. Adve - You Don't Know Jack about Shared Variables or Memory Models
Data races are evil.

Dorian Birsan - On Plug-ins and Extensible Architectures
Extensible application architectures such as Eclipse offer many advantages, but one must be careful to avoid "plug-in hell."


(newest first)

Satnam Singh | Mon, 31 Oct 2011 01:39:53 UTC

Hello Maksim. First, there is an Accelerator support email alias you can use: it is usually quite responsive. Second, there is a new release of Accelerator out now -- perhaps you could try that? Note that not all types are supported on all targets. Finally, if things still don't work out please email me directly at [email protected] (I no longer work at Microsoft).

Maksim Gumerov | Sun, 11 Sep 2011 18:14:46 UTC

Hello! Thank you for the article. Could someone help me with strange problems running a simple test application? The code follows. Problems: 1) If I change data types to DPA, DoubleArrayParam and double, respectively, execution fails for any ttt value. 2) During each iteration, D3DCompiler.dll is loaded and then unloaded. How can I expect any speedup when such thing happen? 3) Even if I make D3dCompiler hold in memory, the performance is still quite terrible. 4) Consequently, accelerator.dll performs about thousand times slower than straightforward CPU computation :( Maybe I introduce a wrong use case? Maybe I should only use accelerator on matrices like 8000x8000 to have any speedup at all? Or maybe I should use another target, not DX9, as PCI transfer costs too much to feel a speedup? But anyway, why load D3Dcompiler at each ToArray???

int main(int argc, char* argv[]) { MicrosoftTargets::DX9Target * tgt = MicrosoftTargets::CreateDX9Target();

const int ttt = 500;

float sarr1[ttt], sarr2[ttt]; for (int k = 0; kToArray(pa4, &scalarmult, 1); //just 1st item }

tgt->Delete(); return 0; }

Satnam Singh | Wed, 03 Aug 2011 17:26:29 UTC

Hello 0xc000005 You could start with Edward Lee's excellent article called "The Problem With Threads" http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf.

I've also been unfortunate enough to write lots of lock-based multi-threaded code in my life and it always feels like shaving with a chainsaw. I even have a current project on hardware design in C# using locks called Kiwi so even I have not managed to totally reach the escape velocity from the word of lock landmines. However, there are many promising developments including transactional memory, parallel functional programming, join patterns ahd nested data-parallelsim.

0xc000005 | Tue, 02 Aug 2011 08:22:56 UTC

Hi Satnam, nice article! Your statement that "locks and monitors are not the right abstractions to use for writing parallel applications" is interesting (and possibly a bit controversial) - can you point me to any articles or links that discuss this point in depth?

Satnam Singh | Wed, 06 Jul 2011 13:56:39 UTC

The Pervasive Parallelism Laboratory is also doing excellent work on using eDSLs written in Scala for heterogeneous computing. See http://ppl.stanford.edu/wiki/index.php/Projects

Leave this field empty

Post a Comment:

© 2018 ACM, Inc. All Rights Reserved.