Comments

(newest first)

queue | Fri, 23 Oct 2015 05:08:47 UTC

/* Better version of Figure 12 routine? (read in fixed-sized courier font) */

#include  // For std::string.
#include  // For std::vector.


/******************************************************************************
PURPOSE: tokenize - Parse string into tokens using delimiter.
INPUTS:  const std::string& input          String to parse.
         const std::string& delimiter      String that separates tokens.
OUTPUTS: std::vector& tokens  Sequence of tokens.
******************************************************************************/

void tokenize( const std::string& input,
               std::vector& tokens,
               const std::string& delimiter = " " ) {

  const size_t input_length = input.length();
  const size_t delimiter_length = delimiter.length();

  tokens.clear();

  if ( input_length && input.compare( delimiter ) ) {

    if ( delimiter_length == 0 ) {
      tokens.push_back( input ); //x
    } else {
      size_t start_index = 0;

      do {
        const size_t end_index = input.find( delimiter, start_index );
        std::string token;

        if ( end_index == std::string::npos ) {
          token = input.substr( start_index );
          start_index = end_index; // Stop looping.
        } else {
          const size_t length = end_index - start_index;
          token = input.substr( start_index, length );
          start_index = end_index + delimiter_length;
        }

        if ( token.length() ) {
          tokens.push_back( token ); //x
        }

      } while ( start_index < input_length );
    }
  }
}

Ron Burk | Sat, 03 Mar 2012 17:37:55 UTC

Choice of IDE is what makes code "clear, maintainable, and understandable"? That's rather like saying that museum lighting is what makes the Mona Lisa a great work of art. A very puzzling claim.

Kevin Wall | Mon, 28 Nov 2011 22:23:07 UTC

Correctness is more important code that is readable. In Example 1, it is doubtful that the code is correct. I suspect that what you intended was to allow either upper or lower case characters for each command, but in reality, you only did this for the first ('q' vs. 'Q') command. If that was not the intent, the superfluous "case" statements should be eliminated.

Roger Ellman | Wed, 16 Nov 2011 11:57:01 UTC

How pleasurable to see that good code parallels good design. Clean, clear and user-friendly!

Roger

John Harriott | Wed, 16 Nov 2011 11:02:31 UTC

Firstly. Whatever style is used why not pick an IDE that automatically formats the code to a consistent style. Why spend valuable time inserting whitespace to align tokens, when that time could be spent writing real code?

Secondly. I suggest the authors read Robert Martins book "Clean Code", it's gold!!! I used to code in style that is discouraged by the book. But after reading the book I no longer write code that way any more. If methods are small, have a consistent level of abstraction and use intention revealing identifiers, the need for whitespace and other formatting is irrelevant IMHO.

doug65536 | Fri, 11 Nov 2011 11:01:51 UTC

While this seems well intentioned, imposing arbitrary rules like this too strictly only serves to annoy programmers, making their work feel less like a passion and more like they are living in a dictatorship. I've had to work under code formatting dictatorship and I resented it. I felt as though non-developers were dictating that the source code look a certain way with no benefit that I could see other than to make me have to pick at the code to make it look a way that I didn't like. It causes you to lose delicate mental context, instead of being able to focus on what you're really doing, implementing an algorithm.

In my opinion, the more deterministic the style, the better. I've always hated programmers indenting parameter declarations to be aligned with the opening parenthesis. It's completely non-deterministic, and you wind up with people spending extra time picking at the whitespace and playing around fixing the indentation to some arbitrary "policy" that causes you to lose context and spend mental cycles on something pretty much irrelevant.

The ruler with which I measure a formatting style: can you write a script that could perform the formatting? Would it require a lot of special cases? Would it require you to maintain a lot of context and backtrack? If so, then it's a bad style. The simpler it is, the less it wipes your mental context.

Too much emphasis is put on making it "easier to read" by someone else. Can they read code or can't they? Some arbitrary strict whitespace policy doesn't make it enough easier to make any significant difference.

I think the only strict rule that should be imposed is not allowing excessively long lines. You should NEVER have to scroll horizontally.

Tim Comber | Fri, 11 Nov 2011 00:29:07 UTC

No mention of abbreviations? I do not let my students use abbreviations for a number of reasons:
* Cognitive load - unless an abbreviation is very familiar it takes  time and effort to work out what the abbreviation means.
* How much harder is it to write 'number' than 'num'? Does two more letters really make a variable too long? 
* Your abbreviation is not  necessarily my abbreviation. This is especially important when teaching students as they are quite happy to write 'nuAccount', 'numbAccount', 'nAccount' etc. They do not know that 'num' is a common programming abbreviation. Why teach them new words when there is a perfectly good English word already available.
* I believe it is better to base names on the language used in the domain of interest. If accountants use Account Number when talking about accounts then the variable should be AccountNumber not AccountNum.

Ed Kimball | Thu, 10 Nov 2011 15:29:41 UTC

@fileoffset -- you may prefer to read YOUR code as a book, but how would you prefer to read someone else's code. As someone looking at the code for the first time and trying to understand it, I find figure 2 vastly inferior to figures 1 and 3.

BTW, the function getBalance in figure 6 violates the principles stated with figure 4. Since the function returns a value without changing any arguments, its name should be a noun or noun phrase, like CurrBal or CurrentBalance, according to figure 4.

Pat LaVarre | Mon, 07 Nov 2011 22:31:43 UTC

@aligned as a table:

The Opposition to spacing code out like a table includes the Python Style of http://www.python.org/dev/peps/pep-0008/ that explicitly discourages "more than one space around an assignment (or other) operator to align it with another".

I think me, when working mostly with people who won't teach their tools to maintain vertically-aligned tabulation for us, I give up, write one blank instead of many, and then tabulate the code in my head on the fly as I read it.

I do remember seeing work-groups educated in the mainframe-not-mini culture of tabulating more often go and agree on everyone using editors that treated any string of two and more blanks as a division between one column of text and the next. Then when you edited the value of a cell of the table, the blank text to the right of it could shrink as far as two blanks to keep the remaining cells of the row in place.

I've not seen that parsing rule duplicated in wikitexts, instead I see people make columnar divisions explicit with a quiet | that wouldn't fit in code, or a loud /*|*/, never by making the width of whitespace significant.

paulsj | Mon, 07 Nov 2011 14:54:33 UTC

Also - statement "code is beautiful because its constant definitions are aligned as a table" is equivalent to statement "painting is nice, I like form of the frame". IMHO code just reflects beauty of thought, and is too subtle to put it in standards (compare it with sense of humour), also because it cannot be viewed outside of context.

Pat LaVarre | Mon, 07 Nov 2011 14:05:47 UTC

Love the article, thank you. Love no paywall, have retweeted with hat tip to @EmbedSys.

By zeitgeist serendipitously my own work has had me stumbling into some of your:

* Variables and classes should be nouns or noun phrases.
* Class names are like collective nouns.
* Variable names are like proper nouns.
* Procedure names should be verbs or verb phrases.
* Methods used to return a value should be nouns or noun phrases.
* Booleans should be adjectives.
* For compound names, retain conventional English syntax.
* Try to make names pronounceable.

But then I see you killed a plural here:

Message = EmergencyAlertLabels[i] // Problematic
AlertText = EmergencyLabel[i] // Preferable

I'm intrigued.

Q: Do you not agree that variables of collection types such as list (or array) and set should be named like PLURAL Proper Nouns, not singular?

For example:

AlertText = EmergencyLabels[i] // More Preferable

Pat LaVarre | Mon, 07 Nov 2011 13:59:42 UTC

@figure12: ... moreTokens = true;while(moreTokens) { ... moreTokens = false:

Q: Spelling {for(;;)} and {break} more creatively - feeding boolean values thru re-assignment of a variable to an engine that dynamically idiosyncratically emulates the control flow effects of standard {for(;;)} and {break} - helps the educated reader how?

I include that question in interviews when I hire a new grad to work with micro-controllers. I'm looking to gauge how much context their experience has built for them to see both sides of that longstanding dispute, not just one side or the other.

Pat LaVarre | Mon, 07 Nov 2011 13:55:07 UTC

@Readable code is in the eye of the beholder @Beauty cannot be standardized:

"""Saying taste is personal preference prevents disputes. And it's not true. You feel this when you start to design things""" ~ P. Graham

steprobe | Mon, 07 Nov 2011 12:45:36 UTC

I really dislike the whitespace approach recommended here and the reason is because I read code horizontally and not vertically. I don't understand why you would arrange elements together when there is no relationship, neither logically nor in how I read between them. Adding extra whitespace to tab it out in line with a completely unrelated element only causes me to have to move my eyes more than necessary. If code such as figure 2 is so bad, add some blank lines between them.

I find it odd you can recommend this spacing approach and then produce horribly squashed code such as that in figure 12.

paulsj | Mon, 07 Nov 2011 10:08:40 UTC

Problem is that beauty cannot be standardized, but main aim of ideas described in this article is to standardize things to make them understandable by others.

For example, not sure, if whitespaces between tokens make it more readable (actually, I consider that practice wrong, maybe only in #define's, or assembler). Code readability has to be balanced with code write-ability - put too much whitespaces, and after bunch of copy+paste you will be spending 50% of your time to correct whitespaces (if they are unaligned, it really looks ugly). This isn't "agile" coding at all.

Also, why should a person put many statements and figure brackets in a single line? Not to say it is bad when you do debugging (and need to place breakpoint between those statements). It is also against much simpler (and more widespread) rules of how to use figure brackets.

So, for me best is to make code to be easy to write (single and simple, universal standard, for example, one which is widely used for Java) and learn to read code.

One more word about code: actually, I am sure no one understands code as it is written. No one reads it line by line (maybe only beginners). Even more, no one reads it at all (do you think about legs, when walking?). Everything depends on how good programmer understands problem domain; code is just a small part of it (every problem has infinite ways how to solve it; each solution is just a branch, you need to get to the "root"). So, not sure if code beautiness is of big importance, when one knows true aim of programmer's work. Much more important is to have simple ideas behind it.

Christophe de Dinechin | Mon, 07 Nov 2011 07:59:16 UTC

Alsys, an Ada compiler company I interned at, used scientific studies on how programmers read code. And they observed that we need visual markers for code structures to scan code quickly. We need the code to guide the eye. Indentation is one such marker, code colorization is another, but comments can also help identify larger structures.

For example, in a function, you need to identify function boundaries quickly, then pick up its interface, a description of what it does, and finally, if you are interested in that particular function, how it does it. The visual markers must help you locate each of these pieces of information quickly.

In my projects, I enforce this using "block" comments, e.g. comments surrounded by ------- lines. The function signature is followed by a one-liner describing what the function does. Any additional function-wide comment follows, and then the actual body of the function. You example 12 would look like this:

void tokenizeString(string S,
    vector &tokenList,
    const string& delimiter = " ")
// -----------------------------------------------
//   Compute a list of tokens in string S
// -----------------------------------------------
//   Tokens found in S will be added to tokenList
//   This function can be called repeatedly with
//   different strings to extend the token list.
//   Complexity is O(N), N being length of S
{
....
}

In practice, I have found that this gives exactly the amount of information needed, and makes it easy to browse through a large amount of code. But there are additional benefits. If you can't explain what your function does in one line, for example, you probably have a vaguely defined function, and you need to fix that first.

Jesse | Mon, 07 Nov 2011 04:34:59 UTC

s/flare/flair. I'm interested in the article and am working on finishing it-- there's a lot to chew on here before I can make any worthwhile comment at all, but I wanted to mention that since it occurs early.

Kristiono Setyadi | Sun, 06 Nov 2011 05:49:50 UTC

A short opinion on this:

". A variable that is rarely used may deserve a long name: for example, MaxPhysicalAddr. When variable names are long, especially if there are many of them, it quickly becomes difficult to see what's going on."

I think you should reverse it to be suitable. When a variable rarely used, i.e. in a local loop, it should use a short name, e.g. i, j, k, etc. But if a variable is commonly used in the code, you should use a more descriptive name i.e. numOfAccount, ipAddress, etc.

I couldn't imagine finding a right line to debug when there are so much i's, j's and k's around your code without knowing what is the description (and hence determine the purposr) of it. It such a pain in the ass :)

Anyway, keep the good work. I like your article!

Someone | Sun, 06 Nov 2011 00:27:30 UTC

A short remark on 'fileCount' vs 'CountFiles': I think both have their place. I would choose the first name in cases where the result is readily available, and the latter in cases where it would require significant (for some definition of the word) resources. In languages that support them, fileCount could be a field or a property. Also, although some languages allow e.g. var numFiles = disk.fileCount to iterate over a disk to count its files, I would strongly advise against implementing such functionality as in that way.

Mark Kornfein | Fri, 04 Nov 2011 17:44:06 UTC

In general good advice the only item i take issue with  is 8. In my experience it is always good practice use braces on multi-line if statements. Over the course of years i have come across way to many bugs where someone later added added another line which they meant to go in the if statement. In the example that could happen in the else section, and the indenting makes that type of error hard to spot.

Michael Mueller | Fri, 04 Nov 2011 12:35:44 UTC

Sorry,
good intention, but horrible execution.

Code like that would _any_ of my developers get fired ... if he doesn't react to feedback.

Iian Neill | Fri, 04 Nov 2011 06:44:44 UTC

I have to confess I am puzzled why my earlier comments were removed from the commentary.  All I mentioned was that I had seen a similar use of tabular layout in assembly language listings -- which I would have thought would at least have given some historical weight to this article's argument!  Is it not legitimate to draw such an historical parallel?

Iian Neill | Fri, 04 Nov 2011 04:19:53 UTC

fileoffset,

To a certain extent the readability of your own code is a matter of taste, but I think it can be argued that the tabular layout draws your attention to the salient code elements much more quickly and lessens visual fatigue.  The problem with Figure 2 is that to get to the important information (the variable names and their data) your eye has to parse each line in a higgledy-piggledy fashion.  That's fine for English text, where the meaning unfolds unpredictably, but slow and tiring for reading code.

I agree that maintenance is practically an issue, but that's really down to your IDE.  There's absolutely no reason why the IDEs couldn't adopt alternative pretty printing conventions.

Iian

Iian Neil | Fri, 04 Nov 2011 04:08:33 UTC

I recently started adopting spacing practises very similar to Figure 3 after re-reading assembly code listings after many years.  The thing that struck me immediately was [a] how beautifully laid out and readable the code was, and, [b] how improbable this should be considering the denseness of assembly!  I then figured if this tabular/vertical alignment worked for assembly listings, why not adopt it for C#, Javascript, SQL, etc.  To my taste, at least, it has made my programs much more readable ...

fileoffset | Fri, 04 Nov 2011 03:59:04 UTC

I must say, I consider Figure 1 and Figure 3 to be inferior to Figure 2.

'Table' based formatting, using whitespace to align arbitrary elements of syntax is high maintenance and in my experience practically unmaintainable, when used in a shared environment. I would rather read my code like a book, than a spreadsheet!

Victor Noagbodji | Fri, 04 Nov 2011 03:27:31 UTC

I really liked the article. What's been said can be found in Elements of Programming Style by Brian Kernighan or The Practice of Programming, by the same author and much more.

In defense of C terseness (sputn), I would like to point out that C is an old language (30+). It's a language that came out in days where the number of characters on one terminal line was definitely important. So many liberties nowadays...

Finally, it might be me, but example 13 misses something. It's doing too much, and could have been broken up in functions. Functions for setting "transmission lines" and "loading bus" and no need for comments even because the reader doesn't need to know how it's done, just that it's being done.

Very good article. Glad to see it in ACM :)

Tim Daly | Fri, 04 Nov 2011 00:48:15 UTC

Consider using Literate Programming (ref: Knuth). Readable code is in the eye of the beholder but the issue is
much more subtle. The real problem with "readable code" occurs when the original authors are no longer 
available, usually when the code is in "maintenance mode".  In order to understand a block of code you need
to know WHY it was written (the motivation), what it depends upon and what depends upon it (the setting)
and the problem it is intended to solve (the context). Literate programs are written to communicate ideas,
motivations, settings, and context from one human to another. The code is a side-effect. Thousands of programs
exist in sourceforge abandoned by their authors and, regardless of the clarity and beauty of the code, they will
never "live" because nobody really understands them.

Sign up for QueueNews

Upcoming Conferences

acmqueue app

Join ACM

Comments