Last week one of our newer engineers checked in a short program to help in debugging problems in the code that we’re developing. Even though this was a test program, several people read the code and then commented on the changes they wanted to see. The code didn’t have any major problems, but it seemed to generate a lot of e-mail for what was being checked in. Eventually the comments in the thread were longer than the program itself. At some point in the thread the programmer who submitted the code said, “Look, I’ve checked in the code; you can paint the bike shed any color you want now,” and then refused to make any more changes to the code. I can understand his frustration at nitpicking comments, but what did he mean about painting the bike shed?
Maybe I’d Like Green
A bike shed is a shed that you park your bike in, and since all such sheds need to be painted to protect them from the weather, they must be painted in some color. Color is very important to some people. OK, that’s not what you’re really asking.
What you witnessed was, unfortunately, a typical reaction to simple changes in a codebase. Have you ever noticed that when someone checks in some complex and, oftentimes, horrific piece of code, the check-in is greeted with an almost deafening silence? That is often because the people who should be reviewing such code just don’t have the time. Unfortunately, lots of people do have the time to review a 10- or 50-line change, and since they feel guilty for not having checked the bigger pieces of code, they nitpick the small pieces.
The explanation for why this occurs was first given by C. Northcote Parkinson, who wrote a book about management (Parkinson’s Law and Other Studies in Administration, Ballantine Books, 1969). He stated that if you were building something complex, then few people would argue with you because few people could understand what you were doing. If you were building something simple—say, a bike shed—which most anyone could build, then everyone would have an opinion. Unfortunately, as you learned, it’s not enough just to have an opinion, but most people feel they should express that opinion.
The engineer who wrote the original code clearly figured out that he was trapped in a pointless loop and decided to break out of it by telling people to paint the shed any color they liked. He or she was probably pretty sure that no one would actually change the color of the shed, but by not engaging in the pointless loop was able to let people get back to ignoring more important parts of the code, which is what they had been doing previously.
P.S. For another version of bike-shed story go to: http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/misc.html#BIKESHED-PAINTING.
P.P.S. I like my sheds to be blue.
I recently got a dressing-down by my boss because I made a large change to our system in one large chunk. I didn’t think it was too large, only about 20 files and a couple of thousand lines of code, but I was forced to back out my change and then commit it in smaller bits to our repository. While I understand that people don’t like large changes because they can be potentially destabilizing, all of this code was interrelated and really couldn’t be intelligently broken up into smaller pieces of work. In your experience, what’s the optimal size for a single check-in?
Lot o’ Lines
First, I rather doubt that you got a dressing-down by your boss. I am sure HR would frown on undressing engineers, or anyone else, in the workplace.
More seriously, the question of how large a check-in should be allowed to a source base is not one to which there is a hard and fast answer. Some features, or bug fixes, require widespread changes to a codebase.
I think many less experienced engineers—and many managers—suffer under the false belief that bug fixes usually require a small number of lines to be changed and that features require a large number of lines. If most bugs were caused by one error or other hard-to-find but easy-to-fix one-liners, then that would be true, but it’s not. Often a bug will be pervasive, infecting an entire codebase. A pervasive bug is usually the result of a false assumption being coded into an entire system, with the result that the fix touches many, if not most, files. When such a fix is required, it makes sense to check in all the changes in one fell swoop. It hardly makes any sense not to fix the same bug, at the same time, in all places.
Where many programmers run into trouble is when they have run-on code. Run-on code is like a run-on sentence. You’re working on a feature, but then you find a bug and then you realize that there is another related bug and you fix the second bug, which leads you to a third problem and you fix that as well, all the while your project lead is asking for the feature that you started implementing when you found the first bug, so you go back to feature development but haven’t checked in any of your fixes because you need to have the feature complete to test the fixes, and at the end you’re left, like our readers, out of breath and unable to remember where the whole thing started.
Part of being an engineer is taking large systems, or problems, and breaking them down into chunks that are small enough for you, and the majority of your co-conspirators—I mean colleagues—to understand and digest. No one wants to read 2,000 lines of code spread over 20 files in one shot. It probably took you more than a week to develop that feature and fix those bugs, and you should not expect people to be able to come up to speed on what you’ve done in only a few minutes by reading your, I am sure, lengthy expository commit message.
One final reason to break up large commits is so that if someone finds a bug in some part of what you’ve written, it might be possible to roll back a smaller change and preserve the rest of the work that you’ve done. The alternative, which is to commit a large chunk of work, then patch patch patch, is sloppy and unsatisfying.
My guidelines here are simple. If you change an API, then change it and all its consumers at once. If you fix three bugs and add a feature, then you need to make four distinct check-ins.
KODE VICIOUS, known to mere mortals as George V. Neville-Neil, works on networking and operating system code for fun and profit. He also teaches courses on various subjects related to programming. His areas of interest are code spelunking, operating systems, and rewriting your bad code (OK, maybe not that last one). He earned his bachelor’s degree in computer science at Northeastern University in Boston, Massachusetts, and is a member of ACM, the Usenix Association, and IEEE. He is an avid bicyclist and traveler who currently lives in New York City.
© 2009 ACM 1542-7730/09/0600 $10.00
Originally published in Queue vol. 7, no. 5—
see this item in the ACM Digital Library
Follow Kode Vicious on Twitter
Have a question for Kode Vicious? E-mail him at email@example.com. If your question appears in his column, we'll send you a rare piece of authentic Queue memorabilia. We edit e-mails for style, length, and clarity.
Ivar Jacobson, Ian Spence, Ed Seidewitz - Industrial Scale Agile - from Craft to Engineering
Essence is instrumental in moving software development toward a true engineering discipline.
Andre Medeiros - Dynamics of Change: Why Reactivity Matters
Tame the dynamics of change by centralizing each concern in its own module.
Brendan Gregg - The Flame Graph
This visualization of software execution is a new necessity for performance profiling and debugging.
Ivar Jacobson, Ian Spence, Brian Kerr - Use-Case 2.0
The Hub of Software Development
(newest first)Dear KV,
please stop calling programmers "engineers". It is unfair to both.
Tony (programmer since 1961 and a "real", i.e. licensed engineer).