This article seems un-informed about how GPUs work. I've been out of the graphics business for a decade, but GPUs definitely had register renaming, and have since atleast shader model 2 (2003). Typically branches run both sets of code and discard the wrong one. They also tend to do work in blocks of pixels. ATI chose 2x2 blocks, while nVidia chose a larger value like 16x16, which is why ATI cards were so much better at branching and dealing with detailed stencil shapes during that era. Again, you might do 4 (or 256) threads when you only needed 1.
Also, the number of threads a GPU could run in parallel would depend on the number of registers a shader used, as you would run out of virtual registers at some point.
During that era, ATI hid the fact that they didn't use full sized floats to gain speed, while nVidia did use full floats, and paid a performance penalty.
There are "shenanigans" in every performance oriented processor.
CodeLurker
|
Sat, 22 Dec 2018 17:30:15 UTC
OK. "C" is not a low-level language. Why is it that, in the now defunct Game of Benchmarks site, C++ programs were usually the fastest; with FORTRAN beating C++ in only occasional number-crunching tasks? I'll admit that a language with cheaper parallelism, cache management primitives, and parallelism cues would in principle be faster than C++. I'd bet you real money, that in the real world, such a language would share with C++: 1) that it is not using garbage collection in most benchmarks (unless they are run in an way not to incur its performance disadvantages), and 2) it is not interpretive. Take that, you academician know-it-alls!
Juneyoung Lee
|
Sat, 27 Oct 2018 17:01:43 UTC
For the unspecified value thing - you may want to add PLDI'17 to your reference? :) (https://dl.acm.org/citation.cfm?id=3062343 )
For the provenance thing - this OOPSLA'18 exactly is about the issue! (https://sf.snu.ac.kr/llvmtwin/ )
Roberto Maurizzi
|
Tue, 16 Oct 2018 02:33:17 UTC
C doesn't guarantee anything about memory protection: the CPU (or better its MMU) does, as anyone that wrote C programs on MMU-less processors can tell you. On those systems (typically single-task but not always, see Commodore Amiga) a program had full access (read and write) to the full address space of the processor and a 'lost pointer' could easily fill all the memory with garbage and crash the OS.
What Intel and friends did is even worse actually: for the sake of compatibility they hid the real internal structure of the processor from the 'external' assembly language and architecture, then they didn't emulate this memory protection architecture probably for the sake of speeding up things.
They allow things that would be illegal for the processor they're emulating because they wanted speed AND backward compatibility, because Microsoft back in the day was refusing to even think about porting Windows to different architectures.
Tom Sobota
|
Thu, 12 Jul 2018 13:51:16 UTC
Years ago I programmed a lot on the PDP-11, be it in assembler, Fortran or C. So I find that the author is right when he says that C is a low-level language on those machines. It is, or better, it was. When I started to program for the X86 architecture, back in the eighties, I also couldn't but notice that the code generated by C was not so low-level anymore, since the PDP-11 instructions like pre- or post- increment weren't there, and the address modes were different.
I have nothing against C/C++, I still use them frequently. But I wonder if some new language than could give us that sensation of control over the program execution wouldn't be welcome. Ditto for a processor architecture with execution parallelism controllable from the language. RISC-V or something?
Blue
|
Fri, 06 Jul 2018 11:37:57 UTC
The author simply assumes the x86 platform then ? A good part of C code in existence isn't written for desktop and server applications anyways, but for sequentially working MCU's which are a lot closer to the original 8086 chip. Also;
"For example, in C, processing a large amount of data means writing a loop that processes each element sequentially." isn't always true, depending on the use case (For example Binary search or jump search don't need to iterate through each element of ordered data).
Anon
|
Thu, 24 May 2018 03:22:29 UTC
I can only assume the author is a big fan of Intels EPIC efforts?
mlwmohawk
|
Mon, 14 May 2018 21:36:07 UTC
This is an excellent troll and strawman argument. The "C" programming language enforces nothing in the way of memory model or threads or processor layout. The language itself can be applied to almost any hypothetical processor system. All that is required is some sort of sane operational characteristic that can be programmed. I will grant that most developers and generic libraries assume thing like malloc, stacks, cache, serial execution but not C. C takes an expression and translate it to a set of instructions, nothing less and nothing more.
What would be really interesting is if you could describe a programming model that would be better than C *and* not be implemented in C.
Eric S. Raymond
|
Thu, 10 May 2018 14:22:21 UTC
I have blogged a detailed response at
http://esr.ibiblio.org/?p=7979
In brief, I think Chisnall's critique is thought-provoking but his prescription mistaken; there are simply too many key algorithms that are what I call SICK ("Serial, Intrinsically; Cope, Kiddo") for his ideas about pocessor design to play well with real workloads.
John Payson
|
Wed, 09 May 2018 20:58:19 UTC
Perhaps the simplest rebuttal to the author's primary point is to quote from the introdution to the published rationale for C99:
"C code can be non-portable. Although it strove to give programmers the opportunity to write truly portable programs, the C89 Committee did not want to force programmers into writing portably, to preclude the use of C as a high-level assembler: the ability to write machine-specific code is one of the strengths of C. It is this principle which largely motivates drawing the distinction between strictly conforming program and conforming program."
To be sure, the C Standard does not require that implementations be suitable for low-level program. On the other hand, it does not require that they be suitable for *any* particular purpose. The C89 Rationale notes, in 2.4.4.1, "While a deficient implementation could probably contrive a program that meets this requirement, yet still succeed in being useless, the Committee felt that such ingenuity would probably require more work than making something useful."
While the C Standard itself makes no reference to "quality", except with regard to the "randomness" produced by rand() and random(), the rationale uses the phrase "quality of implementation" a fair number of times. From Seciton 3 of the C99 Rationale: "The goal of adopting this categorization is to allow a certain variety among implementations which permits quality of implementation to be an active force in the marketplace as well as to allow certain popular extensions, without removing the cachet of conformance to the Standard."
For some reason, it has become fashionable to view the "ingenuity" alluded to in 2.4.4.1 of the C89 Rationale as a good thing, but the text makes it clear it isn't. The only reason the authors didn't explicitly discourage it is that they thought the effort required would be adequate deterrent. Alas, they were mistaken.
Anon
|
Sun, 06 May 2018 10:33:54 UTC
Holy mother of god.
Lets put all the blame of C of all things and just pardon incompetence on all sides.
Hans Jorgensen
|
Sat, 05 May 2018 17:46:38 UTC
I think that most of these problems actually have to do with x86 and Unix/Windows, not C. The problems you mention - instruction-level parallelism, invisible caches, paging behavior - stem from the process model, which puts programs into monolithic units that think they have the flat memory space to themselves. GPU code, even with much better parallelism and memory usage, is still remarkably C-like (or even explicitly C in the case of Nvidia's CUDA), so I imagine that C (which is still a high-level language) could be adapted to use these alternate paradigms.
I imagine that an architecture with better control of such a fast-designed processor would do the following:
- Explicit caching. Basically, instead of using the SRAM memory banks as caches that are invisible to the architecture, expose them as actual memory banks with separate pointer spaces and allocation endpoints (e.g. sbrk_L1(), sbrk_L2(), sbrk_L3(), sbrk_main(), or slightly more abstract and portable names) and let programs use them as they please.
- Explicit access to the individual execution units, including cheap threads.
- No preemptive multitasking - since we have so many parallel execution units, we can have a few kernel threads always running and watchdogging the other threads, and kill them if they're being onerous. Preemptive multitasking was needed when there was only one processor in the system and a single bad program could bring the whole thing down.
- Instead, if a program needs an execution unit, the architecture can just give it one. It can run as long as it wants and yield to the scheduler either to be considerate or to wait for user input or for a lock or condition variable. Most execution units, however, will just terminate their programs quickly, meaning that the expense of running the scheduler is not often incurred (and if it is, it doesn't need to save as much since it can expect the program to save state).
- To avoid triggering OS syscalls too much, the OS could avoid triggering a fault on the "new execution unit" instruction unless the execution unit is not allowed to do this or if no more units are available.
- Bonus thought: You could even ask for an execution unit on every function call - there is no stack space! The program stack idea is instead implemented as a wait/spawn chain, and all functions run asynchronously with a set contract for returning values (such as writing data to a malloc_L1() pointer).
- No attempt on the processor's part to ILP - the architecture will just suck it up and run each instruction in order on the same unit. A high-level language might do execution unit scheduling on its own in the compiler phase, but the assembly will tell you exactly how each execution unit is used.
- An ability to bypass the paging system, if we don't completely throw it out.
- If the architecture had the ability to actually read its own instruction pointer, paging would be much less useful because we could use relative addressing for everything - just transfer the instruction pointer into the accumulator and calculate any necessary jump points and memory lookups.
- Paging is still useful for memory protection, though, so we could still use it even if we expressed it in terms of physical pages without doing any address translation.
As you said, old code would not run well on such an architecture, but that would be because it's old x86 and Unix/Windows code and not strictly because it's old C code. C has been adapted to lots of programming models before, and it could potentially be adapted to one like this, too.
Peter Fors
|
Fri, 04 May 2018 12:04:19 UTC
"In C, a read from an uninitialized variable is an unspecified value and is allowed to be any value each time it is read. This is important, because it allows behavior such as lazy recycling of pages: for example, on FreeBSD the malloc implementation informs the operating system that pages are currently unused, and the operating system uses the first write to a page as the hint that this is no longer true. A read to newly malloced memory may initially read the old value; then the operating system may reuse the underlying physical page; and then on the next write to a different location in the page replace it with a newly zeroed page. The second read from the same location will then give a zero value."
This can never be true, an OS can not change a page after it has been accessed, that would be completely broken. Lazy recycling in an OS is that you get your memory allocation validated immediately, but you don't get the page actually mapped into your memory until you make the first access, whether that is a read or write; the OS will get a invalid page exception where it will check if your read/write was to a memory address that you actually own, if that is true, the OS will assign a zeroed page to that page offset, or if it had been swapped out, it will read it, map it into your memory map and return control to the program. In no OS that actually works will the data in memory change from under your feet, unless you have hardware errors, or the writer of the OS made something really wrong, or you are reading a hardware register, but this was about memory returned from malloc, so that does not apply.
John Payson
|
Wed, 02 May 2018 23:11:38 UTC
[accidentally cut off end] While that would be a bit vague in the absence of a definition of "active association"...
...that still would have been much better than having compiler writers interpret that 6.5p7 (or its predecessor) was intended to apply even in cases where an lvalue had an otherwise-obvious association with an object of the correct type, with the sole exception of those that even the most obtuse compiler writer would have to admit would otherwise be absurd [e.g. accessing s.x directly using the member-access operator].
John Payson
|
Wed, 02 May 2018 23:05:32 UTC
The biggest problem with C is that the C Standards Committee wrote a document
that could usefully be interpreted as a set of guidelines and labeled it a
"standard", even though it lacks the precision necessary to make it usable as
one.
For example, given "struct S { int m; } s;", evaluation of "s.m" will invoke
Undefined Behavior because it accesses an object of type "struct S" with an
lvalue of type "int", and "int" is not one of the types via which N1570 p6.5p7
would allow an object of type "struct S" to be accessed. The equivalent text
in C89 had the same problem. Obviously there must be some circumstances where
an lvalue can access an object of otherwise-incompatible type, but the Standard
fails to say what those are, and nearly all confusion surrounding aliasing is
a result of different people trying to figure out when that rule does or does
not apply.
This problem could have been remedied with Defect Report #028 if the authors
had noted that the rule meant to require that the lvalue used for access must
have an active association with an lvalue of a proper type. While that would
be a bit vague in the absence of a definition of "active association",
Dave P
|
Wed, 02 May 2018 22:06:27 UTC
I applaud you for throwing the cat amongst the pigeons! However isn't this a rant about serial execution models rather than just C?
Your idea for a fast processor - many threads, simple memory model etc, would not be able to execute a standard (single threaded) serial program very fast. Unfortunately we humans like our programs as a simple list of 'do this then do that' as it's like real life. So we like making simple serial programs and therefore making them go fast matters.
When I was a graduate at your esteemed institution, 17 years ago, coding in ways that would suit parallel architectures was touted as the solution to making things go faster, and I believed it back then. But now I believe that fundamentally we want to write serial programs, and parallel programs are fundamentally harder. There is only so much a language can do to abstract the complexities of simultaneous state changes.
Making serial sequences of instructions go fast necessarily requires resolving dependencies between instructions and therefore processor complexity. You _could_ try pushing the responsibility for instruction scheduling up to the compiler or JIT level. But then you can't change processor architecture without changing the compiler. Processor designers would very constrained in the changes they could make without requiring all code to be recompiled (having all machine code being dynamically generated would solve this, but that's another story).
I think the way forward would be to let the compiler tell the processor what the dependencies between the data are - a higher level interface than current machine code - and let the processor go figure out how to make it happen. This would save the processor having to work these dependencies out and would let the connection between machine code and actual performance be tighter. These days you could argue that machine code is not a low level language either! This would head towards the idea of dataflow/asynchronous processors that I haven't heard much about recently.
C is a language with an insane array of corner cases and quirks and has many problems, but the complexity of modern processors is largely driven by the desire to make serial code go fast. C (along with x86) and all the legacy this pair brings are mostly an implementation detail.
ErnQ Zalka (aka ern0)
|
Wed, 02 May 2018 20:15:36 UTC
How the saying goes? Lord, give me strength to change what I can, patience to tolerate what I can not change, and wisdom to distinguish between them. (Sorry, I'm not native English, even I'm agnostic-atheist, but I hope, the message is clear.)
On existing processors, we have no direct access to control caching. Spectre and Meltdown suggest we have some, but it can be utilized only for such purposes that reading slowly some forbidden memory. Okay, we can stick to basic rules, say, use as small arrays as we can, in order to try to fit in the cache, but that's all. Tolerate category.
But transforming an array of struct of field_a and field_b to two arrays of array_a and array_b: it can be done by adding this feature to an existing language. Say, C. Change category!
Introducing intermediate layer, which compiles to final code which fits the actual target processor best: change category!
Organize non-C-ish functions to libraries, which are doing the magic: change category. See OpenCV.
Anyway, thanks for the article, one less thing I don't need to write now :)
bill
|
Wed, 02 May 2018 20:05:03 UTC
I am made dumber by reading this tripe. A low level language is one that targets - to some degree - the underlying processor architecture. Meltdown et al, and many of the hardware features that you discuss, are the domain of micro-architecture. The breakdown was within the contract between the microarchitecture and the architecture. C remains a "low level" language - regardless of whether the microarchitecture changes. And regardless of whether the architecture include anything remotely like PDP-11 semantics.
Jay
|
Wed, 02 May 2018 19:13:37 UTC
Verilog is a C based (Hardware description) language that specifically enables parallelism that is essentially the same syntax with extensions.
John Blake Arnold
|
Wed, 02 May 2018 17:08:31 UTC
I agree with many of the first commenters that Coding Standards within C are the primary defense against compiler issues concerning stack overflow. My experience with the C PreProcessor is any number of previously compiled executables can be included and recompiled in the final executable: this was the method used to include the physics engine in Unreal Engine IV, it was first programmed in MatLab or Octave and compiled (over 14 hours of compile time on a machine with 32 gigs of memory), but once compiled that hard math was available to the cpp in a very simple program. From my perspective there has never been a need to assume a compiler needs to translate between an array of structures and a structure of arrays unless specifically coded by the programmer, and that capability could be made into an executable library just as the physics engine es cited above so the compiler doesn't have to do the work in real time. Finally, the PDP-11 architecture as "proof" of specific hardware coding without abstraction: abstraction is one of C language's most powerful features. And who under 50 years old ever programmed on a PDP-11 versus an IBM PC and higher!? Finally, if abstraction IS allowed then the Parallel Virtual Machine can allocate memory and be programmed using parallel C. Again coding standards like those of NASA allow an avoidance of many of the overflow issues cited by the author. I love C, but even persistent bits on hardware are actually current flowing through dip chips and transistors.
Keith
|
Wed, 02 May 2018 16:35:22 UTC
I don't think anyone who understands C ever thought of it as a low-level language. In fact, and maybe this is my generation, we always thought of it as one of the first high-level languages. I suppose working with assembly gives some perspective. I do realize that we find ourselves in a "let's tear down C" environment lately. I think it threatens millennials who think of it as low-level when compared to the training-wheel languages being used to produce web apps - the ones that allow "programmers" to keep one eye on their phones while "working".
It's basic human nature to bash to death what one does not understand. All of the potential problems noted in the article can be mitigated with good coding practices and careful consideration - alien concepts to what are referred to as "coders" today. I realize that the IOT is upon us and while C can handle the demand most handily, there is a lot of money to be made peddling "Fisher-Price" style turnkey coding languages... Just put the phone down and learn to do it right. No offense to the Fisher-Price corporation was intended or implied. If you raised children before the smart phone, they were God-sent.
Andrew
|
Wed, 02 May 2018 16:30:47 UTC
The C language was deliberately designed to be substitute for programming in assembly language, and assembly language is as close to the hardware as humanly possible. I view C as an abstraction of assembly language using macros. Therefore it should be no surprise that the amount of control you have over the hardware of the computer in C is practically the same as it is in assembly. So to conclude that C was never meant to be mapped to the hardware demonstrates a lack of awareness or research into the history of C. If this article had been about C++, it would have made more logical sense.
The C language is the standard by which every other language is measured. In terms of speed and readability and learning curve and maintainability it is number one. As a result, programs written in C will have fewer patches/bugs than those written in other languages. It is a good language but it is said to have some vulnerabilities (such as buffer overflows) which upon examination tend to be problems with the C-runtime and not C itself. With a few minor modifications, it could be made more mature, but there is no money in it like there are with the hyped up faddish languages.
The problem with Intel was to add things that no one asked for, and at a level that no compiler or assembler had control over. The incorporation of the Minix OS into the modern Intel processor for remote management was a horribly stupid idea that compromised the security of every computer that has it. It wasn't C's fault, it was Intel's. If C is to be improved upon, it clearly won't be by Intel or by anyone who believes that C is not a low level language.
1) https://julialang.org/benchmarks/
2) https://docentes.fct.unl.pt/sites/default/files/jmc-cunha/files/paper_6.pdf
3) http://delivery.acm.org/10.1145/3130000/3126905/p91-ray.pdf?ip=98.161.204.183&id=3126905&acc=OPEN&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E6D218144511F3437&__acm__=1525275216_886bc6cd33e01f4b7a28c4ddf988124b
Victor Yodaiken
|
Wed, 02 May 2018 16:30:20 UTC
1) PDP11 included caches, ILP, non-flat memory, etc.
2) Processor architecture is guided by standard workloads which include, for mainstream processors, a lot of Java.
3) ILP is an optimization of execution of the instruction set, not the programming language.
4) Most GPU programming is in C, no matter what you imagine.
5) Erlang coding style does not solve the "too few threads" problem magically. That's an algorithmic problem, not a programming language problem. Some problems are suited to vectorization or GPU style architectures and some don't.
It is interesting so see sort of endemic, and not terribly informed, hostility towards C on the part of LLVM developers. This hostility seems to be the design basis for a number of really badly engineered "optimizations" based on C standard undefined behavior.
|
Wed, 02 May 2018 16:11:03 UTC
I'm shocked. There I was thinking c was low level...time to wake up and smell the coffee. I definitely learned something today here. Thanks for sharing.
Dave | Fri, 27 Dec 2019 06:53:32 UTC
CodeLurker | Sat, 22 Dec 2018 17:30:15 UTC
Juneyoung Lee | Sat, 27 Oct 2018 17:01:43 UTC
Roberto Maurizzi | Tue, 16 Oct 2018 02:33:17 UTC
Tom Sobota | Thu, 12 Jul 2018 13:51:16 UTC
Blue | Fri, 06 Jul 2018 11:37:57 UTC
Anon | Thu, 24 May 2018 03:22:29 UTC
mlwmohawk | Mon, 14 May 2018 21:36:07 UTC
Eric S. Raymond | Thu, 10 May 2018 14:22:21 UTC
John Payson | Wed, 09 May 2018 20:58:19 UTC
Anon | Sun, 06 May 2018 10:33:54 UTC
Hans Jorgensen | Sat, 05 May 2018 17:46:38 UTC
Peter Fors | Fri, 04 May 2018 12:04:19 UTC
John Payson | Wed, 02 May 2018 23:11:38 UTC
John Payson | Wed, 02 May 2018 23:05:32 UTC
Dave P | Wed, 02 May 2018 22:06:27 UTC
ErnQ Zalka (aka ern0) | Wed, 02 May 2018 20:15:36 UTC
bill | Wed, 02 May 2018 20:05:03 UTC
Jay | Wed, 02 May 2018 19:13:37 UTC
John Blake Arnold | Wed, 02 May 2018 17:08:31 UTC
Keith | Wed, 02 May 2018 16:35:22 UTC
Andrew | Wed, 02 May 2018 16:30:47 UTC
Victor Yodaiken | Wed, 02 May 2018 16:30:20 UTC
| Wed, 02 May 2018 16:11:03 UTC