Computer science is both a science and an art. Its scientific aspects range from the theory of computation and algorithmic studies to code design and program architecture. Yet, when it comes time for implementation, there is a combination of artistic flare, nuanced style, and technical prowess that separates good code from great code.
Like art, code is simultaneously subjective and non-subjective. The non-subjective aspects of coding include "hard" ideas that must be followed to create good code: design patterns, project structures, the use of common libraries, and so on. Although these concepts lay the foundation for developing high-quality, maintainable code, it is the nuances of a programmer's technique and tools—alignment, naming, use of white space, use of context, syntax highlighting, and IDE choice—that truly make code clear, maintainable, and understandable, while also giving code the ability to communicate intent, function, and usage clearly.
This separation between good and great code occurs because every person has an affinity for his or her own particular coding style based on his or her own good (or bad) habits and preferences. Anyone can write code within a design pattern or using certain "hard" techniques, but it takes a great programmer to fill in the details of the code in a way that is clear, concise, and understandable. This is important because just as every person may draw a unique meaning or experience from a single piece of artwork, every developer or reader of code may infer different meanings from the code depending on naming and other conventions, despite the architecture and design of the code.
From another angle, programming may also be seen as a form of "encryption." In various ways the programmer devises a solution to a problem and then encrypts the solution in terms of a program and its support files. Months or years later, when a change is called for, a new programmer must decrypt the solution. This is usually not an enviable task, which can mainly be blamed on a failure of clear communication during the initial "encryption" of the project. Decrypting information is simple when the necessary key is present, and so is understanding old code when special attention has been paid to what the code itself communicates.
To address this issue, some works have defined a single coding standard for an entire programming language,7 while others have acquiesced to accepting naming conventions as long as they are consistent.6 Beautiful code has been defined in general terms as readable, focused, testable, and elegant.1 The more extreme case is the invention of an entire programming language built around a concrete set of ideals, such as Ruby or Python. Ruby emphasizes brevity, simplicity, flexibility, and balance.4 The principles behind Python are clear in The Zen of Python,5 where the focus lies on beauty, simplicity, readability, and reliability.
Our approach to this issue has been to develop a system of coding guidelines (available online3). While these guidelines come from an educational environment, they are designed to be useful to practitioners as well. The guidelines are based on a few broad principles that capture some fundamental principles of communication and elevate the notion of coding conventions to a higher level. The use of these conventions will also improve the sustainability of a code base. This article looks at these underlying principles.
One area not considered here is the use of syntax highlighting or IDEs. While either one may make code more readable (because of syntax highlighting, code folding, etc.) and easier to manage (for example, quickly looking up or refactoring functions and/or variables), our guidelines have been developed to be IDE- and color-neutral. They are meant to reflect foundational principles that are important when writing code in any setting. Also, while IDEs can help improve readability and understanding in some ways, the features found in these tools are not standard (consider the different features found in Visual Studio, Eclipse, and VIM, for example). Likewise, syntax highlighting varies greatly among environments and may easily be changed to match personal preference. The goal of the following principles is to build a foundation for good programming that is independent of the programming IDE.
In a recent ACM Queue article Poul-Henning Kamp2 makes the fascinating point that much of the style of programming languages stems from the ASCII character set and typewriter-based terminals. Programming languages make no use of the graphical properties and options of modern devices. While code must be written with the clarity of good English grammar, it is not English text. Instead it is more like math and tables.
This is a far-reaching principle. First, it speaks directly to the use of fonts. Do not use a variable-width (proportional) font for program code, as code is not text. Fixed-width fonts (e.g., Courier and Data Gothic) look appealing and allow easy alignment of code. Proportional (variable-width) fonts prevent proper alignment, and even more importantly, do not "look like" code.
While one should continue to think of a program as a sequence of actions or as an algorithm at a high level, each section of code should also be thought of as a presentation of a chart, table, or menu. In figures 1, 2, and 3 notice the use of vertical alignment to show symmetry. This is a powerful method of communication.
In the case when a long line of code spills into multiple lines, we suggest breaking and realigning the code. For example, instead of
participant newEntry = new participant (id, name, address1, address2, city,
state, zip, phone, email);
participant newEntry = new participant (id, name, address1, address2,
city, state, zip, phone, email);
participant newEntry = new participant(id, name, address1, address2, city,
state, zip, phone, email);
A programmer creates a name for something with full knowledge of its use, and often many names make sense when one knows what the name represents. Thus, the programmer has this problem: creating a name based on a concept. The true challenge, however, is precisely the opposite: inferring the concept based on the name! This is the problem that the program reader has.
Consider the simple name sputn, taken from the common C++ header file <iostream.h>. An inexperienced or unfamiliar programmer may suddenly be mentally barraged with a bout of questions such as: Is it an integer? A pointer? An array or a structure? A method or a variable? Does sp stand for saved pointer? Is sput an operation to be done n times? Do you pronounce it sputn or s-putn or sput-n or s-put-n?
We advocate basing names on conventional English usage—in particular, simple, informal, abbreviated English usage. Consider the following more specific guidelines.
* Variables and classes should be nouns or noun phrases.
* Class names are like collective nouns.
* Variable names are like proper nouns.
* Procedure names should be verbs or verb phrases.
* Methods used to return a value should be nouns or noun phrases.
* Booleans should be adjectives.
* For compound names, retain conventional English syntax.
* Try to make names pronounceable.
Some examples of this broad principle are shown in figure 4.
There is an interesting but small issue when considering examples such as:
numFiles = countFiles(directory);
While countFiles is a good name, it is not an optimal name since it is a verb. Verbs should be reserved for procedure calls that have an effect on variables. For functions that have no side effects on variables, use a noun or noun phrase. One does not usually say
Y = computeSine(X); or
milesDriven = computeDistance(location1, location2);
Y = sine(X); or
milesDriven = Distance(location1, location2);
We suggest that
numFiles = fileCount(directory);
is a slight improvement. More importantly, this enforces the general rule that verbs denote procedures, and nouns or adjectives denote functions.
All other things being equal, shorter programs are always better. As an example, local variables that are used as index variables may be named i, j, k, etc. An array index used on every line of a loop need not be named any more elaborately than i. Using index or elementNumber obscures the details of the computation through excessive description. A variable that is rarely used may deserve a long name: for example, MaxPhysicalAddr. When variable names are long, especially if there are many of them, it quickly becomes difficult to see what's going on. A variable name can often be shortened by relying on the context in which it is used—for example, the variable Store in a stack implementation rather than StackStore.
Major variables (objects) that are used frequently should be especially short, as seen in the examples in figure 5. For major variables that are used throughout the program, a single letter may encourage program clarity.
While written and spoken communication may reach a high level of clarity, it is often left wanting of meaning if not accompanied by the personal touch of nonverbal cues and tendencies. An individual's body language helps clarify the spoken word. In a similar sense, the programmer relies on white space—what is not said directly—in the code to communicate logic, intent, and understanding.
An example is the use of blank lines between conceptually different sections of code. Blank lines should improve readability as they separate logically different segments of the code and thus provide the literary equivalent of a section break. Appropriate places to use blank lines include:
* When changing from preprocessor directives to code
* Around class and structure declarations
* Around a function definition of some length
* Around a group of logically connected statements of some length
* Between declarations and the executable statements that follow
Consider the code listing in figure 6. Individual blank spaces should also be used to show the logical structure within a single statement. Strategic blank spaces within a line simplify the parsing done by the human reader. At a minimum, blank spaces should be included after the commas in argument lists and around the assignment operator "=" and the redirection operators "<<" and ">>".
On the other hand, blank spaces should not be used for unary operators such as unary minus (-), address of (&), indirection (*), member access (.), increment (++), and decrement (--).
Also, if it makes sense, put two to three statements on one line. This practice has the effect of simplifying the code, but it must be used with discretion and only where it is sensible to do so.
The case statement used in figure 1 brings up a general point: very simple decision statement structures can be tersely presented, showing the alternative code simply, and, if possible, without braces, as in the example in figure 7.
It is not uncommon for simple conditions to be mutually exclusive, creating a kind of generalized case statement. This, as is common practice, can be printed as a chain, as in figure 8.
Of course, it may be that the structures are truly nested, and then one must use either nested spacing or functions to indicate the alternatives. Again, the general point is to let the structure drive the layout, not the syntax of the programming language.
In the brace wars, we do not take a strong stand on the various preferences shown in figure 9, but we do feel strongly that the indent is vital, as it is the indent that shows the structure.
The ability to communicate clearly is an issue that is faced in all facets of the human experience. Programmers must achieve a level of clarity, continuity, and beauty when writing code. This means focusing on the code and its clarity, balance, and symmetry, not on its length or comments. While this concept does not advocate the removal of comments or negate their use and importance in appropriate situations, it does suggest that programmers must use comments wisely and judiciously. The focus should be on developing code that, for the most part, clearly communicates intent and functionality. This practice will automatically reduce the need for many comments.
Although the guidelines presented here are used in an educational setting, they also have merit in industrial environments. Students who are educated using these guidelines will most likely use them (or some variant) as they enter industry. To demonstrate this, we have developed an example that applies these guidelines to two very different styles. The first is the Unix style. It is terse, often making use of vowel deletion, and is often found in realistic applications such as operating-system code. This is not to imply that all or most system programmers use this style, only that it is not unusual. Figure 10 shows a small example of this style.
We call the second style the textbook style, as illustrated in figure 11. Again, this in no way means to imply that all or most textbooks use this style, only that the style in the example is not unusual. In this style the focus is on learning. This means that there is frequent commenting, and the code is well spread out. For the purposes of learning and understanding the details of a language, this style can be excellent. From a practical perspective or for any program of some scale, this style does not work well as it can be overwhelming to use or to read. Moreover, this style makes it difficult to see the overall design, as if one is stuck under the trees and cannot see the forest around.
Figure 12 is a rework of the function in figures 10 and11, using the guidelines discussed here to make a smooth transition between academic and practical code. This figure shows a balance of both styles, relying more directly on the code itself to communicate intent and functionality clearly. Compared with the textbook style, the resultant code is shorter and more compact while still clearly communicating meaning, intent, and functionality. When compared with the Unix style, the code is slightly longer, but the meaning, intent, and functionality are clearer than the original code.
Figure 13 illustrates the guidelines presented here in another setting. This is a function taken from a complex program (10,000 lines) related to power-system reliability and energy use regarding PHEVs (plug-in hybrid electric vehicles). The program makes numerous calculations related to the effect that such vehicles will have on the current power grid and the effect on generation and transmission systems. This program attempts to evaluate the reliability of power systems by developing a model for reliability evaluation using a Monte Carlo simulation.
While the previous examples show the merit of the guidelines presented here, one argument against such guidelines is that making changes to keep a certain coding style intact is time-consuming, particularly when a version-control system is used. In the face of a time-sensitive project or a project that most likely will not be updated or maintained in the future, the effort may not be worthwhile. Typical cases include class projects, a Ph.D. thesis, or a temporary application.
If, however, the codebase in question has a long lifespan or will be updated and maintained by others (for example, an operating system, server, interactive Web site, or other useful application), then almost any changes to improve readability are important, and the time should be taken to ensure the readability and maintainability of the code. This should be a matter of pride, as well as an essential function of one's job.
1. Heusser, M. 2005. Beautiful code. Dr. Dobb's (August); http://www.ddj.com/184407802.
2. Kamp, P-H. 2010. Sir, please step away from the ASR-33!, ACM Queue 8 (10); http://queue.acm.org/detail.cfm?id=1871406.
3. Ledgard, H. 2011. Professional coding guidelines. Unpublished report, University of Toledo; http://www.eng.utoledo.edu/eecs/faculty_web/hledgard/softe/upload/index.php?&direction=0&order=&directory=Reading%203%20Productivity-Management.
4. Molina, M. 2007. What makes code beautiful. Ruby Hoedown.
5. Peters, T. 2004. The Zen of Python. PEP (Python Enhancement Proposals) 20 (August); http://www.python.org/dev/peps/pep-0020/.
6. Reed, D. 2010. Sometimes style really does matter. Journal of Computing Sciences in Colleges 25(5): 180-187.
7. Sun Developer Network. 1999. Code conventions for the Java programming language; http://java.sun.com/docs/codeconv/.
The authors would like to thank David Marcus and Poul-Henning Kemp for their insightful comments while completing this work, as well as the software engineering students who have contributed to these guidelines over the years.
LOVE IT, HATE IT? LET US KNOW
Henry Ledgard received his B.A. from Tufts University in 1964, his Ph.D. from the Massachusetts Institute of Technology in 1969, and spent a year at the University of Oxford as a post-doctoral fellow. His first programs were on punched cards in Fortran. His master's thesis was on a program for a graphical display facility to approximate numerical data with confluent equations. His Ph.D. thesis was on an attempt to provide a generative grammar for the syntax and translation of programming languages. He has been on the faculty at Johns Hopkins University and the University of Massachusetts/Amherst. In 1977 he joined the design team to create the new programming language ADA, then began a consulting and writing practice. In 1989 he joined the faculty at the University of Toledo. Two of his current interests are on creative ways to help people learn and to simplify interfaces to technology.
Robert Green received his bachelor's degree in computer science from Geneva College, his master's degree in computer science from Bowling Green State University, and is pursuing his Ph.D. at the University of Toledo. He has multiple years of experience developing software across a variety of industries. One of his current research interests is writing high-quality, sustainable code.
Originally published in Queue vol. 9, no. 11—
see this item in the ACM Digital Library
Kate Matsudaira - The Science of Managing Data Science
Lessons learned managing a data science research team
Phelim Dowling, Kevin McGrath - Using Free and Open Source Tools to Manage Software Quality
An agile process implementation
Ivar Jacobson, Ed Seidewitz - A New Software Engineering
What happened to the promise of rigorous, disciplined, professional practices for software development?
Erik Meijer, Vikram Kapoor - The Responsive Enterprise: Embracing the Hacker Way
Soon every company will be a software company.
Displaying 10 most recent comments. Read the full list hereLove the article, thank you. Love no paywall, have retweeted with hat tip to @EmbedSys.
By zeitgeist serendipitously my own work has had me stumbling into some of your:
* Variables and classes should be nouns or noun phrases. * Class names are like collective nouns. * Variable names are like proper nouns. * Procedure names should be verbs or verb phrases. * Methods used to return a value should be nouns or noun phrases. * Booleans should be adjectives. * For compound names, retain conventional English syntax. * Try to make names pronounceable.
But then I see you killed a plural here:
Message = EmergencyAlertLabels[i] // Problematic AlertText = EmergencyLabel[i] // Preferable
Q: Do you not agree that variables of collection types such as list (or array) and set should be named like PLURAL Proper Nouns, not singular?
AlertText = EmergencyLabels[i] // More Preferable
The Opposition to spacing code out like a table includes the Python Style of http://www.python.org/dev/peps/pep-0008/ that explicitly discourages "more than one space around an assignment (or other) operator to align it with another".
I think me, when working mostly with people who won't teach their tools to maintain vertically-aligned tabulation for us, I give up, write one blank instead of many, and then tabulate the code in my head on the fly as I read it.
I do remember seeing work-groups educated in the mainframe-not-mini culture of tabulating more often go and agree on everyone using editors that treated any string of two and more blanks as a division between one column of text and the next. Then when you edited the value of a cell of the table, the blank text to the right of it could shrink as far as two blanks to keep the remaining cells of the row in place.
I've not seen that parsing rule duplicated in wikitexts, instead I see people make columnar divisions explicit with a quiet | that wouldn't fit in code, or a loud /*|*/, never by making the width of whitespace significant.
BTW, the function getBalance in figure 6 violates the principles stated with figure 4. Since the function returns a value without changing any arguments, its name should be a noun or noun phrase, like CurrBal or CurrentBalance, according to figure 4.
In my opinion, the more deterministic the style, the better. I've always hated programmers indenting parameter declarations to be aligned with the opening parenthesis. It's completely non-deterministic, and you wind up with people spending extra time picking at the whitespace and playing around fixing the indentation to some arbitrary "policy" that causes you to lose context and spend mental cycles on something pretty much irrelevant.
The ruler with which I measure a formatting style: can you write a script that could perform the formatting? Would it require a lot of special cases? Would it require you to maintain a lot of context and backtrack? If so, then it's a bad style. The simpler it is, the less it wipes your mental context.
Too much emphasis is put on making it "easier to read" by someone else. Can they read code or can't they? Some arbitrary strict whitespace policy doesn't make it enough easier to make any significant difference.
I think the only strict rule that should be imposed is not allowing excessively long lines. You should NEVER have to scroll horizontally.
Secondly. I suggest the authors read Robert Martins book "Clean Code", it's gold!!! I used to code in style that is discouraged by the book. But after reading the book I no longer write code that way any more. If methods are small, have a consistent level of abstraction and use intention revealing identifiers, the need for whitespace and other formatting is irrelevant IMHO.
Displaying 10 most recent comments. Read the full list here