Sir, Please Step Away from the ASR-33!
To move forward with programming languages we need to break free from the tyranny of ASCII.
Poul-Henning Kamp
One of the naughty details of my Varnish software is that the configuration is written in a domain-specific language that is converted into C source code, compiled into a shared library, and executed at hardware speed. That obviously makes me a programming language syntax designer, and just as obviously I have started to think more about how we express ourselves in these syntaxes.
Rob Pike recently said some very pointed words about the Java programming language, which if you think about it, sounded a lot like the pointed words James Gosling had for C++, and remarkably similar to what Bjarne Stroustrup said about good ol' C.
I have always admired Pike. He was already a giant in the field when I started, and his ability to foretell the future has been remarkably consistent.1 In front of me I have a tough row to hoe, but I will attempt to argue that this time Pike is merely rearranging the deckchairs of the Titanic and that he missed the next big thing by a wide margin.
Pike got fed up with C++ and Java and did what any self-respecting hacker would do: he created his own language—better than Java, better than C++, better than C—and he called it Go.
But did he go far enough?
package main
import "fmt"
func main() {
fmt.Printf("Hello, World\n")
}
This does not in any way look substantially different from any of the other programming languages. Fiddle a couple of glyphs here and there and you have C, C++, Java, Python, Tcl, or whatever.
Programmers are a picky bunch when it comes to syntax, and it is a sobering thought that one of the most rapidly adopted programming languages of all time, Perl, barely had one for the longest time. The funny thing is, what syntax designers are really fighting about is not so much the proper and best syntax for the expression of ideas in a machine-understandable programming language as it is the proper and most efficient use of the ASCII table real estate.
IT'S ALL ASCII TO ME...
There used to be a programming language called ALGOL, the lingua franca of computer science back in its heyday. ALGOL was standardized around 1960 and dictated about a dozen mathematical glyphs such as ×, ÷, ¬, and the very readable subscripted 10 symbol, for use in what today we call scientific notation. Back then computers were built by hand and had one-digit serial numbers. Having a teletypewriter customized for your programming language was the least of your worries.
A couple of years later came the APL programming language, which included an extended character set containing a lot of math symbols. I am told that APL still survives in certain obscure corners of insurance and economics modeling.
Then ASCII happened around 1963, and ever since, programming languages have been trying to fit into it. (Wikipedia claims that ASCII grew the backslash [\] specifically to support ALGOL's /\ and \/ Boolean operators. No source is provided for the claim.)
The trouble probably started for real with the C programming language's need for two kinds of and and or operators. It could have used just or and bitor, but | and || saved one and three characters, which on an ASR-33 teletype amounts to 1/10 and 3/10 second, respectively.
It was certainly a fair tradeoff—just think about how fast you type yourself—but the price for this temporal frugality was a whole new class of hard-to-spot bugs in C code.
Niklaus Wirth tried to undo some of the damage in Pascal, and the bickering over begin and end would no } take.
C++ is probably the language that milks the ASCII table most by allowing templates and operator overloading. Until you have inspected your data types, you have absolutely no idea what + might do to them (which is probably why there never was enough interest to stage an International Obfuscated C++ Code Contest, parallel to the IOCCC for the C language).
C++ stops short of allowing the programmer to create new operators. You cannot define :-: as an operator; you have to stick to the predefined set. If Bjarne Stroustrup had been more ambitious on this aspect, C++ could have beaten Perl by 10 years to become the world's second write-only programming language, after APL.
How desperate the hunt for glyphs is in syntax design is exemplified by how Guido van Rossum did away with the canonical scope delimiters in Python, relying instead on indentation for this purpose. What could possibly be of such high value that a syntax designer would brave the controversy this caused? A high-value pair of matching glyphs, { and }, for other use in his syntax could. (This decision also made it impossible to write Fortran programs in Python, a laudable achievement in its own right.)
The best example of what happens if you do the opposite is John Ousterhout's Tcl programming language. Despite all its desirable properties—such as being created as a language to be embedded in tools—it has been widely spurned, often with arguments about excessive use of, or difficult-to-figure-out placement of, {} and [].
My disappointment with Rob Pike's Go language is that the rest of the world has moved on from ASCII, but he did not. Why keep trying to cram an expressive syntax into the straitjacket of the 95 glyphs of ASCII when Unicode has been the new black for most of the past decade?
Unicode has the entire gamut of Greek letters, mathematical and technical symbols, brackets, brockets, sprockets, and weird and wonderful glyphs such as "Dentistry symbol light down and horizontal with wave" (0x23c7). Why do we still have to name variables OmegaZero when our computers now know how to render 0x03a9+0x2080 properly?
The most recent programming language syntax development that had anything to do with character sets apart from ASCII was when the ISO-C standard committee adopted trigraphs to make it possible to enter C source code on computers that do not even have ASCII's 95 characters available—a bold and decisive step in the wrong direction.
While we are at it, have you noticed that screens are getting wider and wider these days, and that today's text processing programs have absolutely no problem with multiple columns, insert displays, and hanging enclosures being placed in that space?
But programs are still decisively vertical, to the point of being horizontally challenged. Why can't we pull minor scopes and subroutines out in that right-hand space and thus make them supportive to the understanding of the main body of code?
And need I remind anybody that you cannot buy a monochrome screen anymore? Syntax-coloring editors are the default. Why not make color part of the syntax? Why not tell the compiler about protected code regions by putting them on a framed light gray background? Or provide hints about likely and unlikely code paths with a green or red background tint?
For some reason computer people are so conservative that we still find it more uncompromisingly important for our source code to be compatible with a Teletype ASR-33 terminal and its 1963-vintage ASCII table than it is for us to be able to express our intentions clearly.
And, yes, me too: I wrote this in vi(1), which is why the article does not have all the fancy Unicode glyphs in the first place.
Q
Reference
1. Pike, R. 2000. Systems software research is irrelevant; http://herpolhode.com/rob/utah2000.pdf.
LOVE IT, HATE IT? LET US KNOW
Poul-Henning Kamp (phk@FreeBSD.org) has programmed computers for 26 years and is the inspiration behind bikeshed.org. His software has been widely adopted as "under the hood" building blocks in both open source and commercial products. His most recent project is the Varnish HTTP accelerator, which is used to speed up large Web sites such as Facebook.
© 2010 ACM 1542-7730/10/1000 $10.00
![]()
Originally published in Queue vol. 8, no. 10—
see this item in the ACM Digital Library
POUL-HENNING KAMP (phk@FreeBSD.org) is one of the primary developers of the FreeBSD operating system, which he has worked on from the very beginning. He is widely unknown for his MD5-based password scrambler, which protects the passwords on Cisco routers, Juniper routers, and Linux and BSD systems. Some people have noticed that he wrote a memory allocator, a device file system, and a disk encryption method that is actually usable. Kamp lives in Denmark with his wife, his son, his daughter, about a dozen FreeBSD computers, and one of the world's most precise NTP (Network Time Protocol) clocks. He makes a living as an independent contractor doing all sorts of stuff with computers and networks.
For additional information see the ACM Digital Library Author Page for: Poul-Henning Kamp


Justin James | Tue, 26 Oct 2010 15:49:27 UTC
Poul-Henning Kamp | Tue, 26 Oct 2010 16:46:58 UTC
Craig Overend | Tue, 26 Oct 2010 17:27:08 UTC
Gordon Tisher | Tue, 26 Oct 2010 17:51:49 UTC
Justin James | Tue, 26 Oct 2010 18:41:27 UTC
Justin James | Tue, 26 Oct 2010 18:55:57 UTC
ctrucza | Tue, 26 Oct 2010 20:44:16 UTC
"Why do we still have to name variables OmegaZero..." We don't have to: #include typedef void !H; class { public: !H I2() { std::wcout << "`}L" << std::endl; } }; !H ªÀ¼() { robPike; robPike.I2(); } int main() { ªÀ¼(); }ctrucza | Tue, 26 Oct 2010 20:45:58 UTC
Johan Kotlinski | Wed, 27 Oct 2010 10:31:34 UTC
Ed Kimball | Wed, 27 Oct 2010 13:36:41 UTC
Jim White | Thu, 28 Oct 2010 07:23:46 UTC
Zeljko Vrba | Fri, 29 Oct 2010 08:00:00 UTC
Poul-Henning Kamp | Fri, 29 Oct 2010 13:12:14 UTC
Really ? Keyboards are why we are not doing it ? You are clearly not european. If you were, you would know what a pain in the ass it already is to get the usual [{|\}] characters by some contorted CTRL-ALT-STICK-SHIFT sequence on non-US keyboards. A keyboard costs USD20 these days, I simply don't buy the argument that we cannot afford to improve them to our purposes. Poul-HenningZeljko Vrba | Fri, 29 Oct 2010 14:14:21 UTC
Ethan | Sat, 30 Oct 2010 16:41:19 UTC
linux2.6.27 | Sat, 30 Oct 2010 17:28:44 UTC
Pierre | Sat, 30 Oct 2010 17:44:44 UTC
Bradley C. Harder | Sat, 30 Oct 2010 17:52:09 UTC
Speaking of Tcl, how about this example? #!/usr/pkg/bin/tclsh8.6 proc ÒÐÛÐàïÝÑÐ {} { # Georgian "hello", via Emacs. puts "Oh hi!" } ÒÐÛÐàïÝÑÐ ;# I can eat glass, georgian, via http://www.columbia.edu/kermit/utf8.html set Û "ÛØÜÐá ÕíÐÛ ÓÐ ÐàÐ ÛâÙØÕÐ." puts ${Û}James Dunne | Sat, 30 Oct 2010 19:57:20 UTC
CoffeeZombie | Sat, 30 Oct 2010 21:33:32 UTC
Mikel Ward | Sat, 30 Oct 2010 23:07:45 UTC
James Cash | Sun, 31 Oct 2010 01:02:25 UTC
Andrew | Sun, 31 Oct 2010 01:19:43 UTC
Poul-Henning Kamp | Sun, 31 Oct 2010 10:58:30 UTC
Andrew | Mon, 01 Nov 2010 00:43:34 UTC
Chris Jillings | Mon, 01 Nov 2010 01:26:20 UTC
Guest | Mon, 01 Nov 2010 01:57:34 UTC
Jason Ozolins | Mon, 01 Nov 2010 02:06:19 UTC
Abdul Alhazred | Mon, 01 Nov 2010 02:07:40 UTC
Robert Melton | Mon, 01 Nov 2010 02:11:13 UTC
Jason Ozolins | Mon, 01 Nov 2010 02:19:15 UTC
Konrad | Mon, 01 Nov 2010 02:20:56 UTC
Daniel Dilts | Mon, 01 Nov 2010 02:21:09 UTC
Dave Täht | Mon, 01 Nov 2010 02:55:24 UTC
batman | Mon, 01 Nov 2010 03:00:39 UTC
Revar | Mon, 01 Nov 2010 03:09:17 UTC
Fred in IT | Mon, 01 Nov 2010 03:36:05 UTC
Mike | Mon, 01 Nov 2010 03:37:54 UTC
Bob L | Mon, 01 Nov 2010 03:50:57 UTC
Daniel Beckham | Mon, 01 Nov 2010 04:20:44 UTC
Noa | Mon, 01 Nov 2010 04:43:21 UTC
fred x quimby | Mon, 01 Nov 2010 05:03:30 UTC
Steve Wolfson | Mon, 01 Nov 2010 05:07:16 UTC
MikeFM | Mon, 01 Nov 2010 05:19:38 UTC
Adam Jorgensen | Mon, 01 Nov 2010 05:30:39 UTC
Eric Hawthorne | Mon, 01 Nov 2010 08:12:11 UTC
Poul-Henning Kamp | Mon, 01 Nov 2010 08:37:13 UTC
Alex | Mon, 01 Nov 2010 08:37:43 UTC
Charley Carroll | Mon, 01 Nov 2010 08:55:18 UTC
Thom | Mon, 01 Nov 2010 08:57:40 UTC
PierreG | Mon, 01 Nov 2010 09:11:04 UTC
Luuk | Mon, 01 Nov 2010 10:38:00 UTC
Bernhard Stadler | Mon, 01 Nov 2010 10:53:16 UTC
Required Name | Mon, 01 Nov 2010 10:57:15 UTC
Required Name | Mon, 01 Nov 2010 10:59:03 UTC
Tim | Mon, 01 Nov 2010 11:45:04 UTC
Poul-Henning Kamp quipped : "You are clearly not european. If you were, you would know what a pain in the ass it already is to get the usual [{|\}] characters by some contorted CTRL-ALT-STICK-SHIFT sequence on non-US keyboards." I am European, and use a european layout QWERTY keyboard, and have no problem entering the characters - all without any enter/meta/alt/control/shift weirdness. Three of the key strokes are directly made by my little finger and the others with the same fingers but with a shift added. Not so onerous really, is it ? I can only guess that this comment, along with parts of the articles, were either deliberate trolling or just very poorly thought out. I would kindly suggest that gross exaggeration to the point of utter nonsense does your argument little good. TAlec Cawley | Mon, 01 Nov 2010 12:48:00 UTC
Robert Woodhead | Mon, 01 Nov 2010 13:49:59 UTC
foljs | Mon, 01 Nov 2010 14:49:09 UTC
Ethan | Mon, 01 Nov 2010 15:02:33 UTC
Robin | Mon, 01 Nov 2010 15:12:49 UTC
rdm | Mon, 01 Nov 2010 15:14:09 UTC
Blub | Mon, 01 Nov 2010 16:19:02 UTC
Keith Thompson | Mon, 01 Nov 2010 16:29:48 UTC
Fred in IT | Mon, 01 Nov 2010 16:56:13 UTC
Don Viszneki | Mon, 01 Nov 2010 17:43:23 UTC
hauptmech | Mon, 01 Nov 2010 19:01:39 UTC
Poul-Henning Kamp | Mon, 01 Nov 2010 20:08:47 UTC
Wayne Christoher | Mon, 01 Nov 2010 20:11:55 UTC
Poul-Henning Kamp | Mon, 01 Nov 2010 20:14:20 UTC
samwyse | Mon, 01 Nov 2010 20:16:07 UTC
samwyse | Mon, 01 Nov 2010 20:19:10 UTC
Wayne Christopher | Mon, 01 Nov 2010 20:26:47 UTC
Poul-Henning Kamp | Mon, 01 Nov 2010 20:36:44 UTC
Wayne Christopher | Mon, 01 Nov 2010 20:47:05 UTC
Peter Wone | Tue, 02 Nov 2010 00:05:13 UTC
Hans Kruse | Tue, 02 Nov 2010 00:17:45 UTC
KPG | Tue, 02 Nov 2010 01:42:34 UTC
KPG | Tue, 02 Nov 2010 01:45:56 UTC
Hans | Tue, 02 Nov 2010 15:26:30 UTC
Why mix together the use of characters in a programming language with pure editor features like coloring certain regions of code or floating other regions above/beside the next? How exactly would unicode characters improve a language syntax? We have the () <> [] {} already right? How many more open/close characters can we actually introduce before they start being too similar and give raise to bugs like the ones related to || that you dscribe in the article? And what would be the gain? We might "free up" {} or whatever for use in some other part of the language. But um lets see you seem to be the only one who thinks that we are running out of characters and that it might be a problem. You argue for using "bitor" over || and at the same time you claim that more characters would improve anything. Its quite rediculous. But please if you would like to submit patches for eclipse, netbeans or qt creator that colors private variables BE MY GUEST. It probably takes as much time as writing this whole pointless article seing as how the syntax coloring infrastructure is already in there.fotis | Tue, 02 Nov 2010 19:26:16 UTC
KB | Tue, 02 Nov 2010 19:59:23 UTC
Christophe de Dinechin | Fri, 05 Nov 2010 08:25:22 UTC
I've looked up this entire page, and the word "semantics" is not written once. I've stopped counting "syntax". But the problem is not the syntax, it's the semantics that makes a language more or less expressive. The steps forward in programming happened when functions, or objects, or distributed programming, or operator overloading, or exceptions, or concurrency became usable by programmers. It doesn't really change much if you describe "task" with a chinese glyph or the four ASCII characters t-a-s-k, what matters is what it means, not how it looks. In my own programming language, XL, the syntax is all ASCII, because there's a single "built-in" operator, ->. But you can use it to invent your own notation, as in: if true then X:code else Y:code -> X if false then X:code else Y:code -> Y See http://xlr.sourceforge.net/Concept%20Programming%20Presentation.pdf for a more in-depth discussion of this "concepts come first" approach, which I called, obviously enough, "concept programming".Mikel | Sat, 06 Nov 2010 05:10:50 UTC
clive | Sat, 19 Feb 2011 17:54:15 UTC
Zorba | Mon, 23 Jan 2012 23:27:34 UTC