March/April 2018 issue of acmqueue The March/April issue of acmqueue is out now



Programming Languages

  Download PDF version of this article PDF

ITEM not available

acmqueue

Originally published in Queue vol. 16, no. 2
see this item in the ACM Digital Library


Tweet



Related:

Tobias Lauinger, Abdelberi Chaabane, Christo Wilson - Thou Shalt Not Depend on Me
A look at JavaScript libraries in the wild


Robert C. Seacord - Uninitialized Reads
Understanding the proposed revisions to the C language


Carlos Baquero, Nuno Preguiça - Why Logical Clocks are Easy
Sometimes all you need is the right language.


Erik Meijer, Kevin Millikin, Gilad Bracha - Spicing Up Dart with Side Effects
A set of extensions to the Dart programming language, designed to support asynchrony and generator functions



Comments

(newest first)

Displaying 10 most recent comments. Read the full list here

Anon | Thu, 24 May 2018 03:22:29 UTC

I can only assume the author is a big fan of Intels EPIC efforts?


mlwmohawk | Mon, 14 May 2018 21:36:07 UTC

This is an excellent troll and strawman argument. The "C" programming language enforces nothing in the way of memory model or threads or processor layout. The language itself can be applied to almost any hypothetical processor system. All that is required is some sort of sane operational characteristic that can be programmed. I will grant that most developers and generic libraries assume thing like malloc, stacks, cache, serial execution but not C. C takes an expression and translate it to a set of instructions, nothing less and nothing more.

What would be really interesting is if you could describe a programming model that would be better than C *and* not be implemented in C.


Eric S. Raymond | Thu, 10 May 2018 14:22:21 UTC

I have blogged a detailed response at

http://esr.ibiblio.org/?p=7979

In brief, I think Chisnall's critique is thought-provoking but his prescription mistaken; there are simply too many key algorithms that are what I call SICK ("Serial, Intrinsically; Cope, Kiddo") for his ideas about pocessor design to play well with real workloads.


John Payson | Wed, 09 May 2018 20:58:19 UTC

Perhaps the simplest rebuttal to the author's primary point is to quote from the introdution to the published rationale for C99:

"C code can be non-portable. Although it strove to give programmers the opportunity to write truly portable programs, the C89 Committee did not want to force programmers into writing portably, to preclude the use of C as a high-level assembler: the ability to write machine-specific code is one of the strengths of C. It is this principle which largely motivates drawing the distinction between strictly conforming program and conforming program."

To be sure, the C Standard does not require that implementations be suitable for low-level program. On the other hand, it does not require that they be suitable for *any* particular purpose. The C89 Rationale notes, in 2.4.4.1, "While a deficient implementation could probably contrive a program that meets this requirement, yet still succeed in being useless, the Committee felt that such ingenuity would probably require more work than making something useful."

While the C Standard itself makes no reference to "quality", except with regard to the "randomness" produced by rand() and random(), the rationale uses the phrase "quality of implementation" a fair number of times. From Seciton 3 of the C99 Rationale: "The goal of adopting this categorization is to allow a certain variety among implementations which permits quality of implementation to be an active force in the marketplace as well as to allow certain popular extensions, without removing the cachet of conformance to the Standard."

For some reason, it has become fashionable to view the "ingenuity" alluded to in 2.4.4.1 of the C89 Rationale as a good thing, but the text makes it clear it isn't. The only reason the authors didn't explicitly discourage it is that they thought the effort required would be adequate deterrent. Alas, they were mistaken.


Anon | Sun, 06 May 2018 10:33:54 UTC

Holy mother of god. Lets put all the blame of C of all things and just pardon incompetence on all sides.


Hans Jorgensen | Sat, 05 May 2018 17:46:38 UTC

I think that most of these problems actually have to do with x86 and Unix/Windows, not C. The problems you mention - instruction-level parallelism, invisible caches, paging behavior - stem from the process model, which puts programs into monolithic units that think they have the flat memory space to themselves. GPU code, even with much better parallelism and memory usage, is still remarkably C-like (or even explicitly C in the case of Nvidia's CUDA), so I imagine that C (which is still a high-level language) could be adapted to use these alternate paradigms.

I imagine that an architecture with better control of such a fast-designed processor would do the following: - Explicit caching. Basically, instead of using the SRAM memory banks as caches that are invisible to the architecture, expose them as actual memory banks with separate pointer spaces and allocation endpoints (e.g. sbrk_L1(), sbrk_L2(), sbrk_L3(), sbrk_main(), or slightly more abstract and portable names) and let programs use them as they please. - Explicit access to the individual execution units, including cheap threads. - No preemptive multitasking - since we have so many parallel execution units, we can have a few kernel threads always running and watchdogging the other threads, and kill them if they're being onerous. Preemptive multitasking was needed when there was only one processor in the system and a single bad program could bring the whole thing down. - Instead, if a program needs an execution unit, the architecture can just give it one. It can run as long as it wants and yield to the scheduler either to be considerate or to wait for user input or for a lock or condition variable. Most execution units, however, will just terminate their programs quickly, meaning that the expense of running the scheduler is not often incurred (and if it is, it doesn't need to save as much since it can expect the program to save state). - To avoid triggering OS syscalls too much, the OS could avoid triggering a fault on the "new execution unit" instruction unless the execution unit is not allowed to do this or if no more units are available. - Bonus thought: You could even ask for an execution unit on every function call - there is no stack space! The program stack idea is instead implemented as a wait/spawn chain, and all functions run asynchronously with a set contract for returning values (such as writing data to a malloc_L1() pointer). - No attempt on the processor's part to ILP - the architecture will just suck it up and run each instruction in order on the same unit. A high-level language might do execution unit scheduling on its own in the compiler phase, but the assembly will tell you exactly how each execution unit is used. - An ability to bypass the paging system, if we don't completely throw it out. - If the architecture had the ability to actually read its own instruction pointer, paging would be much less useful because we could use relative addressing for everything - just transfer the instruction pointer into the accumulator and calculate any necessary jump points and memory lookups. - Paging is still useful for memory protection, though, so we could still use it even if we expressed it in terms of physical pages without doing any address translation.

As you said, old code would not run well on such an architecture, but that would be because it's old x86 and Unix/Windows code and not strictly because it's old C code. C has been adapted to lots of programming models before, and it could potentially be adapted to one like this, too.


Peter Fors | Fri, 04 May 2018 12:04:19 UTC

"In C, a read from an uninitialized variable is an unspecified value and is allowed to be any value each time it is read. This is important, because it allows behavior such as lazy recycling of pages: for example, on FreeBSD the malloc implementation informs the operating system that pages are currently unused, and the operating system uses the first write to a page as the hint that this is no longer true. A read to newly malloced memory may initially read the old value; then the operating system may reuse the underlying physical page; and then on the next write to a different location in the page replace it with a newly zeroed page. The second read from the same location will then give a zero value."

This can never be true, an OS can not change a page after it has been accessed, that would be completely broken. Lazy recycling in an OS is that you get your memory allocation validated immediately, but you don't get the page actually mapped into your memory until you make the first access, whether that is a read or write; the OS will get a invalid page exception where it will check if your read/write was to a memory address that you actually own, if that is true, the OS will assign a zeroed page to that page offset, or if it had been swapped out, it will read it, map it into your memory map and return control to the program. In no OS that actually works will the data in memory change from under your feet, unless you have hardware errors, or the writer of the OS made something really wrong, or you are reading a hardware register, but this was about memory returned from malloc, so that does not apply.


John Payson | Wed, 02 May 2018 23:11:38 UTC

[accidentally cut off end] While that would be a bit vague in the absence of a definition of "active association"...

...that still would have been much better than having compiler writers interpret that 6.5p7 (or its predecessor) was intended to apply even in cases where an lvalue had an otherwise-obvious association with an object of the correct type, with the sole exception of those that even the most obtuse compiler writer would have to admit would otherwise be absurd [e.g. accessing s.x directly using the member-access operator].


John Payson | Wed, 02 May 2018 23:05:32 UTC

The biggest problem with C is that the C Standards Committee wrote a document that could usefully be interpreted as a set of guidelines and labeled it a "standard", even though it lacks the precision necessary to make it usable as one.

For example, given "struct S { int m; } s;", evaluation of "s.m" will invoke Undefined Behavior because it accesses an object of type "struct S" with an lvalue of type "int", and "int" is not one of the types via which N1570 p6.5p7 would allow an object of type "struct S" to be accessed. The equivalent text in C89 had the same problem. Obviously there must be some circumstances where an lvalue can access an object of otherwise-incompatible type, but the Standard fails to say what those are, and nearly all confusion surrounding aliasing is a result of different people trying to figure out when that rule does or does not apply.

This problem could have been remedied with Defect Report #028 if the authors had noted that the rule meant to require that the lvalue used for access must have an active association with an lvalue of a proper type. While that would be a bit vague in the absence of a definition of "active association",


Dave P | Wed, 02 May 2018 22:06:27 UTC

I applaud you for throwing the cat amongst the pigeons! However isn't this a rant about serial execution models rather than just C? Your idea for a fast processor - many threads, simple memory model etc, would not be able to execute a standard (single threaded) serial program very fast. Unfortunately we humans like our programs as a simple list of 'do this then do that' as it's like real life. So we like making simple serial programs and therefore making them go fast matters. When I was a graduate at your esteemed institution, 17 years ago, coding in ways that would suit parallel architectures was touted as the solution to making things go faster, and I believed it back then. But now I believe that fundamentally we want to write serial programs, and parallel programs are fundamentally harder. There is only so much a language can do to abstract the complexities of simultaneous state changes. Making serial sequences of instructions go fast necessarily requires resolving dependencies between instructions and therefore processor complexity. You _could_ try pushing the responsibility for instruction scheduling up to the compiler or JIT level. But then you can't change processor architecture without changing the compiler. Processor designers would very constrained in the changes they could make without requiring all code to be recompiled (having all machine code being dynamically generated would solve this, but that's another story). I think the way forward would be to let the compiler tell the processor what the dependencies between the data are - a higher level interface than current machine code - and let the processor go figure out how to make it happen. This would save the processor having to work these dependencies out and would let the connection between machine code and actual performance be tighter. These days you could argue that machine code is not a low level language either! This would head towards the idea of dataflow/asynchronous processors that I haven't heard much about recently. C is a language with an insane array of corner cases and quirks and has many problems, but the complexity of modern processors is largely driven by the desire to make serial code go fast. C (along with x86) and all the legacy this pair brings are mostly an implementation detail.


Displaying 10 most recent comments. Read the full list here
Leave this field empty

Post a Comment:







© 2018 ACM, Inc. All Rights Reserved.