The Bike Shed

Development

  Download PDF version of this article

A Generation Lost in the Bazaar

Quality happens only when someone is responsible for it.


Poul-Henning Kamp


Thirteen years ago, Eric Raymond's book The Cathedral and the Bazaar (O'Reilly Media, 2001) redefined our vocabulary and all but promised an end to the waterfall model and big software companies, thanks to the new grass-roots open source software development movement. I found the book thought provoking, but it did not convince me. On the other hand, being deeply involved in open source, I couldn't help but think that it would be nice if he was right.

The book I brought to the beach house this summer is also thought provoking, much more so than Raymond's (which it even mentions rather positively): Frederick P. Brooks's The Design of Design (Addison-Wesley Professional, 2010). As much as I find myself nodding in agreement and as much as I enjoy Brooks's command of language and subject matter, the book also makes me sad and disappointed.

Thirteen years ago also marks the apogee of the dot-com euphoria, where every teenager was a Web programmer and every college dropout had a Web startup. I had genuine fun trying to teach some of those greenhorns about the good old-fashioned tricks of the trade—test-restoring backups, scripting operating-system installs, version control, etc. Hindsight, of course, is 20/20 (i.e., events may have been less fun than you remember), and there is no escaping that the entire dot-com era was a disaster for IT/CS in general and for software quality and Unix in particular.

I have not seen any competent analysis of how much bigger the IT industry became during the dot-com years. My own estimate is that—counted in the kinds of jobs that would until then have been behind the locked steel doors of the IT department—our trade grew by two orders of magnitude, or if you prefer, by more than 10,000 percent.

Getting hooked on computers is easy—almost anybody can make a program work, just as almost anybody can nail two pieces of wood together in a few tries. The trouble is that the market for two pieces of wood nailed together—inexpertly—is fairly small outside of the "proud grandfather" segment, and getting from there to a decent set of chairs or fitted cupboards takes talent, practice, and education. The extra 9,900 percent had neither practice nor education when they arrived in our trade, and before they ever had the chance to acquire it, the party was over and most of them were out of a job. I will charitably assume that those who managed to hang on were the most talented and most skilled, but even then there is no escaping that as IT professionals they mostly sucked because of their lack of ballast.

The bazaar meme advocated by Raymond, "Just hack it," as opposed to the carefully designed cathedrals of the pre-dot-com years, unfortunately did, not die with the dot-com madness, and today Unix is rapidly sinking under its weight.

I updated my laptop. I have been running the development version of FreeBSD for 18 years straight now, and compiling even my Spartan work environment from source code takes a full day, because it involves trying to make sense and architecture out of Raymond's anarchistic software bazaar.

At the top level, the FreeBSD ports collection is an attempt to create a map of the bazaar that makes it easy for FreeBSD users to find what they need. In practice this map consists, right now, of 22,198 files that give a summary description of each stall in the bazaar—a couple of lines telling you roughly what that stall offers and where you can read more about it. Also included are 23,214 Makefiles that tell you what to do with the software you find in each stall. These Makefiles also try to inform you of the choices you should consider, which options to choose, and what would be sensible defaults for them. The map also conveniently comes with 24,400 patch files to smooth over the lack of craftsmanship of many of the wares offered, but, generally, it is lack of portability that creates a need for these patch files.

Finally, the map helpfully tells you that if you want to have www/firefox, you will first need to get devel/nspr, security/nss, databases/sqlite3, and so on. Once you look up those in the map and find their dependencies, and recursively look up their dependencies, you will have a shopping list of the 122 packages you will need before you can get to www/firefox.

Modularity and code reuse are, of course, A Good Thing. Even in the most trivially simple case, however, the CS/IT dogma of code reuse is totally foreign in the bazaar: the software in the FreeBSD ports collection contains at least 1,342 copied and pasted cryptographic algorithms.

If that resistance/ignorance of code reuse had resulted in self-contained and independent packages of software, the price of the code duplication might actually have been a good tradeoff for ease of package management. But that was not the case: the packages form a tangled web of haphazard dependencies that results in much code duplication and waste.

Here is one example of an ironic piece of waste: Sam Leffler's graphics/libtiff is one of the 122 packages on the road to www/firefox, yet the resulting Firefox browser does not render TIFF images. For reasons I have not tried to uncover, 10 of the 122 packages need Perl and seven need Python; one of them, devel/glib20, needs both languages for reasons I cannot even imagine.

Further down the shopping list are repeated applications of the Peter Principle, the idea that in an organization where promotion is based on achievement, success, and merit, that organization's members will eventually be promoted beyond their level of ability. The principle is commonly phrased, "Employees tend to rise to their level of incompetence." Applying the principle to software, you will find that you need three different versions of the make program, a macroprocessor, an assembler, and many other interesting packages. At the bottom of the food chain, so to speak, is libtool, which tries to hide the fact that there is no standardized way to build a shared library in Unix. Instead of standardizing how to do that across all Unixen—something that would take just a single flag to the ld(1) command—the Peter Principle was applied and made it libtool's job instead. The Peter Principle is indeed strong in this case—the source code for devel/libtool weighs in at 414,740 lines. Half that line count is test cases, which in principle is commendable, but in practice it is just the Peter Principle at work: the tests elaborately explore the functionality of the complex solution for a problem that should not exist in the first place. Even more maddening is that 31,085 of those lines are in a single unreadably ugly shell script called configure. The idea is that the configure script performs approximately 200 automated tests, so that the user is not burdened with configuring libtool manually. This is a horribly bad idea, already much criticized back in the 1980s when it appeared, as it allows source code to pretend to be portable behind the veneer of the configure script, rather than actually having the quality of portability to begin with. It is a travesty that the configure idea survived.

The 1980s saw very different Unix implementations: Cray-1s with their 24-bit pointers, Amdahl UTS mainframe Unix, a multitude of more or less competently executed SysV+BSD mashups from the minicomputer makers, the almost—but not quite—Unix shims from vendors such as Data General, and even the genuine Unix clone Coherent from the paint company Mark Williams.

The configure scripts back then were written by hand and did things like figure out if this was most like a BSD- or a SysV-style Unix, and then copied one or the other Makefile and maybe also a .h file into place. Later the configure scripts became more ambitious, and as an almost predictable application of the Peter Principle, rather than standardize Unix to eliminate the need for them, somebody wrote a program, autoconf, to write the configure scripts.

Today's Unix/Posix-like operating systems, even including IBM's z/OS mainframe version, as seen with 1980 eyes are identical; yet the 31,085 lines of configure for libtool still check if <sys/stat.h> and <stdlib.h> exist, even though the Unixen, which lacked them, had neither sufficient memory to execute libtool nor disks big enough for its 16-MB source code.

How did that happen?

Well, autoconf, for reasons that have never made sense, was written in the obscure M4 macro language, which means that the actual tests look like this:

## Whether `make' supports order-only prerequisites.
AC_CACHE_CHECK([whether ${MAKE-make} supports order-only prerequisites],
  [lt_cv_make_order_only],
  [mkdir conftest.dir
   cd conftest.dir
   touch b
   touch a
cat >confmk << 'END'
a: b | c
a b c:
       touch $[]@
END
  touch c
  if ${MAKE-make} -s -q -f confmk >/dev/null 2>&1; then
    lt_cv_make_order_only=yes
  else
    lt_cv_make_order_only=no
  fi
  cd ..
  rm -rf conftest.dir
])
if test $lt_cv_make_order_only = yes; then
  ORDER='|'
else
  ORDER=''
fi
AC_SUBST([ORDER])

Needless to say, this is more than most programmers would ever want to put up with, even if they had the skill, so the input files for autoconf happen by copy and paste, often hiding behind increasingly bloated standard macros covering "standard tests" such as those mentioned earlier, which look for compatibility problems not seen in the past 20 years.

This is probably also why libtool's configure probes no fewer than 26 different names for the Fortran compiler my system does not have, and then spends another 26 tests to find out if each of these nonexistent Fortran compilers supports the -g option.

That is the sorry reality of the bazaar Raymond praised in his book: a pile of old festering hacks, endlessly copied and pasted by a clueless generation of IT "professionals" who wouldn't recognize sound IT architecture if you hit them over the head with it. It is hard to believe today, but under this embarrassing mess lies the ruins of the beautiful cathedral of Unix, deservedly famous for its simplicity of design, its economy of features, and its elegance of execution. (Sic transit gloria mundi, etc.)

One of Brooks's many excellent points is that quality happens only if somebody has the responsibility for it, and that "somebody" can be no more than one single person—with an exception for a dynamic duo. I am surprised that Brooks does not cite Unix as an example of this claim, since we can pinpoint with almost surgical precision the moment that Unix started to fragment: in the early 1990s when AT&T spun off Unix to commercialize it, thereby robbing it of its architects.

More than once in recent years, others have reached the same conclusion as Brooks. Some have tried to impose a kind of sanity, or even to lay down the law formally in the form of technical standards, hoping to bring order and structure to the bazaar. So far they have all failed spectacularly, because the generation of lost dot-com wunderkinder in the bazaar has never seen a cathedral and therefore cannot even imagine why you would want one in the first place, much less what it should look like. It is a sad irony, indeed, that those who most need to read it may find The Design of Design entirely incomprehensible. But to anyone who has ever wondered whether using m4 macros to configure autoconf to write a shell script to look for 26 Fortran compilers in order to build a Web browser was a bit of a detour, Brooks offers well-reasoned hope that there can be a better way.

LOVE IT, HATE IT? LET US KNOW

feedback@queue.acm.org

Poul-Henning Kamp (phk@FreeBSD.org) has programmed computers for 26 years and is the inspiration behind bikeshed.org. His software has been widely adopted as under-the-hood building blocks in both open source and commercial products. His most recent project is the Varnish HTTP accelerator, which is used to speed up large Web sites such as Facebook.

© 2012 ACM 1542-7730/12/0800 $10.00

acmqueue

Originally published in Queue vol. 10, no. 8
see this item in the ACM Digital Library


Tweet



Related:

Ivar Jacobson, Pan-Wei Ng, Ian Spence, Paul E. McMahon - Major-league SEMAT: Why Should an Executive Care?
Becoming better, faster, cheaper, and happier


Alex E. Bell - The Software Inferno
Dante's tale, as experienced by a software architect


Ivar Jacobson, Ian Spence, Pan-Wei Ng - Agile and SEMAT - Perfect Partners
Combining agile and SEMAT yields more advantages than either one alone


Jacob Loveless - Barbarians at the Gateways
High-frequency Trading and Exchange Technology



Comments

Displaying 10 most recent comments. Read the full list here

M. Simon | Wed, 26 Dec 2012 06:08:40 UTC

The problem is "C". But people are so used to it that they don't even see it.

E. Sarmas | Wed, 26 Dec 2012 12:52:51 UTC

I totally agree with above comment. The problem is in the use of "C". We need at last a new start (maybe Java or a cleaner C with Java influences but certainly not C++).

M. Simon | Wed, 26 Dec 2012 22:11:51 UTC

Let me add that I'm currently working with a guy who managed 5,000 programmers at AT&T and who is a crack programmer himself and he agrees with me. If all you have is a 75 tool pocket knife and what you really need is a hammer....

David Ormand | Wed, 02 Jan 2013 19:03:16 UTC

Good article, and controversial. The takeaway I'm getting is that the root problem is the lack of a "proper" education, even from "accredited" public universities and tech schools, for CS/IT people. Well, I'm not CS/IT, I'm just a relic who graduated from the TI-99/4A to doing embedded code (assembly, C, C++, Ada) in defense products (which for the most part do NOT have any formal software quality requirements!), and I'm very aware of my lack of training. Given the shortcomings of regular school-based education (especially for those with full-time employment) and the difficulty of getting mentoring from old-timers, any suggestions for self-teaching? Like a preferred reading list; I see suggestions for McConnell's "Code Complete", and I've heard of the "Mythical Man-Month" (and see its regular application - in the wrong way - in my company), and I've got the K&R books and the Humphrey's "Personal Software Process" book (my company is a CMM level 5 outfit, so I grok process) and some textbook for "Design Patterns" that is heavy on UML. What I think would be sweet is a website with recommended reading lists for various software-related disciplines, such as system programming, network, crypto, embedded, etc.

Dan Cross | Thu, 14 Feb 2013 21:22:03 UTC

My list is here: http://pub.gajendra.net/2012/09/stone_age

Dave Wyland | Mon, 01 Apr 2013 18:55:21 UTC

Great discussion. One responder proposed an interesting idea, that Moore's law and advancing computer hardware performance indirectly allowed sloppy coding and software anarchy. An interesting idea. Moore's Law tended to improve performance by ~30%/year (CPU clock rate), year-after-year, inexorably. Hard to adjust to on a continuing basis. At some point, a new programmer can just assume that hardware performance is infinite, and that we should just go play in the new, digital ocean. But that has quietly changed. Moore's Law went into the ICU in 2004. I had a 3.5 GHz Pentium 4 in 2004. There has never been a 4 GHz Pentium, to this day. Moore's law hit the scaling wall. Before 2004, each shrink meant cheaper silicon, lower power and higher speed. After 2004 (~90 nm), each shrink meant cheaper silicon, *higher* power and *lower* speed. Intel and others struggle with magic chemicals to make 2.5 GHz processors today. And no proven, near term solution to this problem has emerged. I think this means we are in the clean up phase, where we get better results by cleaning up, reducing and streamlining code rather than adding code. This will be interesting to watch, whether you subscribe to the cathedral or bazaar model.

ZenBowman | Wed, 08 May 2013 01:31:53 UTC

Great discussion, and it reminds me of an Alan Kay interview: """ Binstock: You once referred to computing as pop culture. Kay: It is. Complete pop culture. I'm not against pop culture.... I consider jazz to be a developed part of high culture. Anything that's been worked on and developed and you [can] go to the next couple levels. Binstock: One thing about jazz aficionados is that they take deep pleasure in knowing the history of jazz. Kay: Yes! Classical music is like that, too. But pop culture holds a disdain for history. Pop culture is all about identity and feeling like you're participating. It has nothing to do with cooperation, the past or the future  it's living in the present. I think the same is true of most people who write code for money. They have no idea where [their culture came from]  and the Internet was done so well that most people think of it as a natural resource like the Pacific Ocean, rather than something that was man-made. When was the last time a technology with a scale like that was so error-free? The Web, in comparison, is a joke. The Web was done by amateurs. """ I'm a relative newbie, at the age of 30, but I found the vast majority of my CS curriculum (apart from a few courses on compilers, operating systems and AI) very superficial. This past year I studied SICP and received more illumination from those video lectures than I ever did in any coursework done during my PhD. I was lucky enough to work at a really good lab (USC/ISI) during my PhD, and the one thing I'm very thankful for is that I met some genuine programming wizards, who imbibed in me an appreciation for the history of our field. Without an appreciation for our history, we will just continue to loop in cycles. The author is spot-on.

ksym | Thu, 26 Sep 2013 21:25:48 UTC

I am a FreeBSD user and I fully agree that FreeBSD ports is a chaotic mesh full of traps ready to blow up the sysadmins' faces. Wanna upgrade your software today? Oh, please do read the UPDATING document very carefully to avoid failing builds AND to make sure the ports do not screw up your system (like updating perl often does).

ksym | Thu, 26 Sep 2013 21:28:31 UTC

I could argue that the only free Unix-like system that has no problems with software maintenance is Debian with the Stable -repository. But then again, Debian is the only Linux I have ever used since it is relatively simplistic at it's core, yet has a working package manager.

darkfader | Sat, 27 Sep 2014 15:05:31 UTC

Hi, I couldn't agree more. I had read & believed about the bazaar, been a linux user at first. Then I started working with commercial, ugly bloated "real" unix and surprisingly found my linux to be utter shit in comparism to something that was designed with thought and care. The days are gone and the bullshit is still there. Countless years of people's lifetime and endless money are spent working around useless bugs in useless frameworks that work no better than the bugged software before them. i.e. HAL that still doesn't get you closer to "startx" with a reliably working input than with your CRT modeline hacking in 1998. All you get is one more "stall" with cheap crap that doesn't do its job yet tries to do 5 more jobs.
Displaying 10 most recent comments. Read the full list here
Leave this field empty

Post a Comment:







© 2014 ACM, Inc. All Rights Reserved.