You're Doing It Wrong
Think you've mastered the art of server performance? Think again.
Poul-Henning Kamp
Would you believe me if I claimed that an algorithm that has been on the books as "optimal" for 46 years, which has been analyzed in excruciating detail by geniuses like Knuth and taught in all computer science courses in the world, can be optimized to run 10 times faster?
A couple of years ago, I fell into some interesting company and became the author of an open source HTTP accelerator called Varnish, basically an HTTP cache to put in front of slow Web servers. Today Varnish is used by Web sites of all sorts, from Facebook, Wikia, and Slashdot to obscure sites you have surely never heard of.
Having spent 15 years as a lead developer of the FreeBSD kernel, I arrived in user land with a detailed knowledge of what happens under the system calls. One of the main reasons I accepted the Varnish proposal was to show how to write a high-performance server program.
Because, not to mince words, the majority of you are doing that wrong. Not just wrong as in not perfect, but wrong as in wasting half, or more, of your performance.
The first user of Varnish, the large Norwegian newspaper VG, replaced 12 machines running Squid with three machines running Varnish. The Squid machines were flat-out 100 percent busy, while the Varnish machines had 90 percent of their CPU available for twiddling their digital thumbs.a
The really short version of the story is that Varnish knows it is not running on the bare metal but under an operating system that provides a virtual-memory-based abstract machine. For example, Varnish does not ignore the fact that memory is virtual; it actively exploits it. A 300-GB backing store, memory mapped on a machine with no more than 16 GB of RAM, is quite typical. The user paid for 64 bits of address space, and I am not afraid to use it.
One particular task, inside Varnish, is expiring objects from the cache when their virtual lifetimers run out of sand. This calls for a data structure that can efficiently deliver the smallest keyed object from the total set.
A quick browse of the mental catalog flipped up the binary-heap card, which not only sports an O(log2(n)) transaction performance, but also has a meta-data overhead of only a pointer to each object—which is important if you have 10 million+ objects.
Careful rereading of Knuth confirmed that this was the sensible choice, and the implementation was trivial: "Ponto facto, Cæsar transit," etc.
On a recent trip by night-train to Amsterdam, my mind wandered, and it struck me that Knuth might be terribly misleading on the performance of the binary heap, possibly even by an order of magnitude. On the way home, also on the train, I wrote a simulation that proved my hunch right.
Before any fundamentalist CS theoreticians choke on their coffees: don't panic! The P vs. NP situation is unchanged, and I have not found a systematic flaw in the quality of Knuth et al.'s reasoning. The findings of CS, as we know it, are still correct. They are just a lot less relevant and useful than you think—at least with respect to performance.
The oldest reference to the binary heap I have located, in a computer context, is J.W.J. Williams' article published in the June 1964 issue of Communications of the ACM, titled "Algorithm number 232 - Heapsort."2,b The trouble is, Williams was already out of touch, and his algorithmic analysis was outdated even before it was published.
In an article in the April 1961 of CACM, J. Fotheringham documented how the Atlas Computer at Manchester University separated the concept of an address from a memory location, which for all practical purposes marks the invention of VM (virtual memory).1 It took quite some time before virtual memory took hold, but today all general-purpose, most embedded, and many specialist operating systems use VM to present a standardized virtual machine model (i.e., POSIX) to the processes they herd.
It would of course be unjust and unreasonable to blame Williams for not realizing that Atlas had invalidated one of the tacit assumptions of his algorithm: only hindsight makes that observation possible. The fact is, however, 46 years later most CS-educated professionals still ignore VM as a matter of routine. This is an embarrassment for CS as a discipline and profession, not to mention wasting enormous amounts of hardware and electricity.
Performance simulation
Enough talk. Let me put some simulated facts on the table. The plot in figure 1 shows the runtime of the binary heap and of my new B-heap version for 1 million items on a 64-bit machine.c (My esteemed FreeBSD colleague Colin Percival helpfully pointed out that the change I have made to the binary heap is very much parallel to the change from binary tree to B-tree, so I have adopted his suggestion and named my new variant a B-heap.d)

The x-axis is VM pressure, measured in the amount of address space not resident in primary memory, because the kernel paged it out to secondary storage. The left y-axis is runtime in seconds (log-scale), and the right Y-axis shows the ratio of the two runtimes: (binary heap / B-heap).
Let's get my "order of magnitude" claim out of the way. When we zoom in on the left side (figure 2), we see that there is indeed a factor 10 difference in the time the two algorithms take when running under almost total VM pressure: only 8 to 10 pages of the 1,954 pages allocated are in primary memory at the same time.

Did you just decide that my order of magnitude claim was bogus, because it is based on only an extreme corner case? If so, you are doing it wrong, because this is pretty much the real-world behavior seen.
Creating and expiring objects in Varnish are relatively infrequent actions. Once created, objects are often cached for weeks if not months, and therefore the binary heap may not be updated even once per minute; on some sites not even once per hour.
In the meantime, we deliver gigabytes of objects to clients' browsers, and since all these objects compete for space in the primary memory, the VM pages containing the binheap that are not accessed get paged out. In the worst case of only nine pages resident, the binary heap averages 11.5 page transfers per operation, while the B-heap needs only 1.14 page transfers. If your server has SSD (solid state drive) disks, that is the difference between each operation taking 11 or 1.1 milliseconds. If you still have rotating platters, it is the difference between 110 and 11 milliseconds.
At this point, is it wrong to think, "If it runs only once per minute, who cares, even if it takes a full second?"
We do, in fact, care because the 10 extra pages needed once per minute loiter in RAM for a while, doing nothing for their keep—until the kernel pages them back out again, at which point they get to pile on top of the already frantic disk activity, typically seen on a system under this heavy VM pressure.e
Next, let us zoom in on the other end of the plot (figure 3). If there is no VM pressure, the B-heap algorithm needs more comparisons than the binary sort, and the simple parent-to-child / child-to-parent index calculation is a tad more involved: so, instead of a runtime of 4.55 seconds, it takes 5.92 seconds, a whopping 30 percent slower—almost 350 nanoseconds slower per operation.

So, yes, Knuth and all the other CS dudes had their math figured out right.
If, however, we move left on the curve, then we find, at a VM pressure of four missing pages (= 0.2 percent) the B-heap catches up, because of fewer VM page faults; and it gradually gets better and better, until as we saw earlier, it peaks at 10 times faster.
That was assuming you were using an SSD disk, which can do a page operation in 1 millisecond—pretty optimistic, in particular for the writes. If we simulate a mechanical disk by setting the I/O time to a still-optimistic 10 milliseconds instead (figure 4), then B-heap is 10 percent faster as soon as the kernel steals just a single page from our 1,954-page working set and 37 percent faster when four pages are missing.

So what is a B-heap, anyway?
The only difference between a binary heap and a B-heap is the formula for finding the parent from the child, or vice versa.
The traditional n -> {2n, 2n+1} formula leaves us with a heap built of virtual pages stacked one over the next, which causes (almost) all vertical traversals to hit a different VM page for each step up or down in the tree, as shown in figure 5, with eight items per page. (The numbers show the order in which objects are allocated, not the key values.)

The B-heap builds the tree by filling pages vertically, to match the direction we traverse the heap (figure 6). This rearrangement increases the average number of comparison/swap operations required to keep the tree invariant true, but ensures that most of those operations happen inside a single VM page and thus reduces the VM footprint and, consequently, VM page faults.

Two details are worth noting:
* Once we leave a VM page through the bottom, it is important for performance that both child nodes live in the same VM page, because we are going to compare them both with their parent.
* Because of this, the tree fails to expand for one generation every time it enters a new VM page in order to use the first two elements in the page productively.
In our simulated example, failure to do so would require five pages more.
If that seems unimportant to you, then you are doing it wrong: try shifting the B-heap line 20 KB to the right in figures 2 and 3, and think about the implications.
The parameters of my simulation are chosen to represent what happens in real life in Varnish, and I have not attempted to comprehensively characterize or analyze the performance of the B-heap for all possible parameters. Likewise, I will not rule out that there are smarter ways to add VM-clue to a binary heap, but I am not inclined to buy a ticket on the Trans-Siberian Railway in order to find time to work it out.
The order of magnitude of difference obviously originates with the number of levels of heap inside each VM page, so the ultimate speedup will be on machines with small pointer sizes and big page sizes. This is a pertinent observation, as operating system kernels start to use superpages to keep up with increased I/O throughput.
So why are you, and I, still doing it wrong?
There used to be an (in)famous debate, "Quicksort vs. Heapsort," centering on the fact that the worst-case behavior of the former is terrible, whereas the latter has worse average performance but no such "bad spots." Depending on your application, that can be a very important difference.
We lack a similar inquiry into algorithm selection in the face of the anisotropic memory access delay caused by virtual memory, CPU caches, write buffers, and other facts of modern hardware.
Whatever book you learned programming from, it probably had a figure within the first five pages diagramming a computer much like that shown in figure 7. That is where it all went pear-shaped: that model is totally bogus today.

It is, amazingly, the only conceptual model used in computer education, despite the fact that it has next to nothing to do with the execution environment on a modern computer. And just for the record: by modern, I mean VAX 11/780 or later.
The past 30 or 40 years of hardware and operating-systems development seems to have only marginally impinged on the agenda in CS departments' algorithmic analysis sections, and as far as my anecdotal evidence, it has totally failed to register in the education they provide.
The speed disparity between primary and secondary storage on the Atlas Computer was on the order of 1:1,000. The Atlas drum took 2 milliseconds to deliver a sector; instructions took approximately 2 microseconds to execute. You lost around 1,000 instructions for each VM page fault.
On a modern multi-issue CPU, running at some gigahertz clock frequency, the worst-case loss is almost 10 million instructions per VM page fault. If you are running with a rotating disk, the number is more like 100 million instructions.f
What good is an O(log2(n)) algorithm if those operations cause page faults and slow disk operations? For most relevant datasets an O(n) or even an O(n^2) algorithm, which avoids page faults, will run circles around it.
Performance analysis of algorithms will always be a cornerstone achievement of computer science, and like all of you, I really cherish the foldout chart with the tape-sorts in volume 3 of The Art of Computer Programming. But the results coming out of the CS department would be so much more interesting and useful if they applied to real computers and not just toys like ZX81, C64, and TRS-80.
Q
Notes
a. This pun is included specifically to inspire Stan Kelly-Bootle.
b. How wonderful must it have been to live and program back then, when all algorithms in the world could be enumerated in an 8-bit byte?
c. Page size is 4 KB, each holding 512 pointers of 64 bits. The VM system is simulated with dirty tracking and perfect LRU page replacement. Paging operations set to 1 millisecond. Object key values are produced by random(3). The test inserts 1 million objects, then alternately removes and inserts objects 1 million times, and finally removes the remaining 1 million objects from the heap. Source code is at http://phk.freebsd.dk/B-Heap.
d. Does CACM still enumerate algorithms, and is eight bits still enough?
e. Please don't take my word for it: applying queuing theory to this situation is a very educational experience.
f. And below the waterline there are the flushing of pipelines, now useless and in the way, cache content, page-table updates, lookaside buffer invalidations, page-table loads, etc. It is not atypical to find instructions in the "for operating system programmers" section of the CPU data book, which take hundreds or even thousands of clock cycles, before everything is said and done.
References
1. Fotheringham, J. 1961. Dynamic storage allocation in the Atlas Computer, including an automatic use of a backing store. Communications of the ACM 4(10): 435-436; http://doi.acm.org/10.1145/366786.366800.
2. Williams, J. W. J. 1964. Algorithm 232 - Heapsort. Communications of the ACM 7(6): 347-348.
LOVE IT, HATE IT? LET US KNOW
Poul-Henning Kamp (phk@FreeBSD.org) has programmed computers for 26 years and is the inspiration behind bikeshed.org. His software has been widely adopted as "under the hood" building blocks in both open source and commercial products. His most recent project is the Varnish HTTP-accelerator, which is used to speed up large Web sites such as Facebook.
© 2010 ACM 1542-7730/10/0600 $10.00
![]()
Originally published in Queue vol. 8, no. 6—
see this item in the ACM Digital Library
POUL-HENNING KAMP (phk@FreeBSD.org) is one of the primary developers of the FreeBSD operating system, which he has worked on from the very beginning. He is widely unknown for his MD5-based password scrambler, which protects the passwords on Cisco routers, Juniper routers, and Linux and BSD systems. Some people have noticed that he wrote a memory allocator, a device file system, and a disk encryption method that is actually usable. Kamp lives in Denmark with his wife, his son, his daughter, about a dozen FreeBSD computers, and one of the world's most precise NTP (Network Time Protocol) clocks. He makes a living as an independent contractor doing all sorts of stuff with computers and networks.
For additional information see the ACM Digital Library Author Page for: Poul-Henning Kamp


dave | Sat, 12 Jun 2010 17:48:04 UTC
Bruno Martinez | Sat, 12 Jun 2010 20:17:16 UTC
Shriphani Palakodety | Sat, 12 Jun 2010 22:25:37 UTC
This is a very interesting article and the FreeBSD programmers are obviously one of the top programmers in the community. I wanted to point out that the traditional binary heap doesn't go n -> {2n, 2n + 1}. It goes n -> {2n + 1, 2n + 2}. If n were 0 (the root of the tree), its children should be at positions 1 and 2 in the array used to represent the heap, not 0 and 1. Other than that, this is a very nice article and thanks for sharing it with us.Tom Limoncelli | Sat, 12 Jun 2010 23:18:35 UTC
Joseph Kern | Sun, 13 Jun 2010 00:02:10 UTC
sac | Sun, 13 Jun 2010 00:50:05 UTC
Hatem Nassrat | Sun, 13 Jun 2010 00:54:14 UTC
Paulo Silveira | Sun, 13 Jun 2010 04:45:18 UTC
Itai Lahan | Mon, 14 Jun 2010 19:28:04 UTC
Poul-Henning Kamp | Mon, 14 Jun 2010 20:33:03 UTC
me | Mon, 14 Jun 2010 22:51:16 UTC
ransak | Tue, 15 Jun 2010 00:28:31 UTC
Martin T | Tue, 15 Jun 2010 00:31:35 UTC
Craig | Tue, 15 Jun 2010 00:57:33 UTC
Noah Mendelsohn | Tue, 15 Jun 2010 01:08:22 UTC
Noah Mendelsohn | Tue, 15 Jun 2010 01:09:04 UTC
Chris Hamilton | Tue, 15 Jun 2010 01:20:50 UTC
Gary | Tue, 15 Jun 2010 02:16:44 UTC
evanh | Tue, 15 Jun 2010 02:19:18 UTC
Totti | Tue, 15 Jun 2010 02:38:32 UTC
Antek | Tue, 15 Jun 2010 02:41:57 UTC
Alan Tam | Tue, 15 Jun 2010 03:27:00 UTC
BigJeff | Tue, 15 Jun 2010 03:39:42 UTC
Mark Rustad | Tue, 15 Jun 2010 05:53:55 UTC
Anand Vidwansa | Tue, 15 Jun 2010 08:07:21 UTC
bob | Tue, 15 Jun 2010 09:02:46 UTC
Cellar | Tue, 15 Jun 2010 09:07:41 UTC
apurv | Tue, 15 Jun 2010 10:03:19 UTC
James | Tue, 15 Jun 2010 10:13:01 UTC
Donny Viszneki | Tue, 15 Jun 2010 10:18:36 UTC
none | Tue, 15 Jun 2010 10:20:41 UTC
Edward Cullen | Tue, 15 Jun 2010 10:21:10 UTC
Poul-Henning Kamp | Tue, 15 Jun 2010 10:30:28 UTC
myne | Tue, 15 Jun 2010 11:14:56 UTC
Colin Guthrie | Tue, 15 Jun 2010 12:04:53 UTC
Leonard Norrgård | Tue, 15 Jun 2010 12:18:34 UTC
Christoph Bartoschek | Tue, 15 Jun 2010 12:54:07 UTC
Brian Krupp | Tue, 15 Jun 2010 13:32:22 UTC
Angelo Schneider | Tue, 15 Jun 2010 15:23:41 UTC
Darron | Tue, 15 Jun 2010 16:16:33 UTC
Nick Murphy | Tue, 15 Jun 2010 18:22:11 UTC
Sadly, like so many others, this article takes a valid and useful technical point ("being oblivious to the underlying VM architecture can sometimes cost you dearly in performance") and sullies it in utterly tangential social commentary delivered in an unhelpfully condescending tone (in this case an unjustified and unjustifiable indictment of CS education). Any good OS course (and most of them are, in fact, good) teaches students to think about these kinds of issues. But students have to first learn a) how to program at all, and b) how to do algorithmic analysis at all, and condemning those prerequisites for making simplifying assumptions like uniform memory access is like condemning a parent trying to teach their child to ride a bike for not including a rudimentary discussion of gear differentials. The world would be grateful if PHK would stick to things he knows, like kernel hacking and server performance analysis, rather than making ignorant comments about things he doesn't know, like pedagogy and university curricula.Joshua Drake | Tue, 15 Jun 2010 18:23:53 UTC
dave wade | Tue, 15 Jun 2010 18:33:56 UTC
Bill | Tue, 15 Jun 2010 18:37:10 UTC
john slimick | Tue, 15 Jun 2010 19:10:07 UTC
Chad Brewbaker | Tue, 15 Jun 2010 19:51:47 UTC
lizardb0y | Tue, 15 Jun 2010 20:10:41 UTC
Vladimir G. Ivanovic | Tue, 15 Jun 2010 20:12:13 UTC
Jon-Pierre Gentil | Tue, 15 Jun 2010 22:11:58 UTC
Donald Haase | Tue, 15 Jun 2010 23:44:45 UTC
Daemeon Reiydelle | Wed, 16 Jun 2010 00:09:14 UTC
pduey | Wed, 16 Jun 2010 03:26:35 UTC
Fini Alring | Wed, 16 Jun 2010 09:28:37 UTC
GSK | Wed, 16 Jun 2010 21:06:39 UTC
G Roelofs | Wed, 16 Jun 2010 21:51:36 UTC
Ger Hobbelt | Thu, 17 Jun 2010 02:14:33 UTC
[continued...] Calling the fig.7 model 'outdated' is simply showing you got it wrong. It was never outdated; all you did was recall a very basic model which was taught early on, and it had its usefulness then, as it still has today, in CS101. Yes, it's probably not a good model once you enter CS201, but heck, weren't classes meant to increase in difficulty, yet persist in memory beyond the test? ;-) ** It's comparable to calculus as taught on primary schools: the first model you learn is simple and it only considers positive integers. I was angry at my teachers for 'lying' to me regarding the question '7-5=2. Right. Now what's 5-7?': "Son, that's impossible." Turned out it wasn't. Same with Knuth, Fig.7 and all the rest: They haven't been lying to us; they are correct. Within the bounds of the model they defined up front. O(f(n)) only turns out to be a white lie/fallacy when you adjust/enhance your model to /more closely/ mimic actual reality. That's what this is all about. It's generally accessible knowledge since the 1990's at least which, very broadly speaking, says that B-tree-like organisations/layouts significantly benefit algorithm performance in light of a multi-level memory access model. That's what you just realized and applied to your problem. That shows insight. There's always been register banks, CPU caches, and other 'memory levels' since the dawn of computer time. If you seek the 'ultimate' in performance, or at least something that surpasses your current system, you need to model these aspects and [re]consider what you're doing. A lot of researchers have been doing that, all the time. All you and I got to do, as a minimum, is make sure we find access to their work and read up. Being a member of the ACM gives you an edge. It's up to you to use that edge. ** For those who seek to learn even after leaving school: sites like CiteSeer (and google academics) provide fast and easy access to ongoing research. Including articles which translate fundamental research to practice. It takes time and some perseverance to keep up, but wasn't that part of the teachings of your education? It is for me. ** All in all, nice work. A thorough article which satisfies both technically and (thanks to slashdot headline editors?) as a wake-up call for some as well. Thanks for writing it, in the style it was written. Direct and sure hitting a nerve. If anything, it's thought-provoking on several levels, which is a highly meritable achievement in itself. ** ** PS: I am not an educator (have sometimes been a part-time one, though). My only regrettable bias is that I tend to meet quite a few colleagues who seem to have forgotten that the ability to learn algorithms (or anything math related) is not just a useful nerve-ending to help get you your degree in higher ed.Ger Hobbelt | Thu, 17 Jun 2010 02:15:21 UTC
Sigh. I regret this will be a partial or, for those who are able to read between lines, complete rehash of what has been said above, but I'll say it anyway. ** First off, kuddo's for developing some good thoughts on the night train. Chapeau. ** Second, a bit of criticism that many seem to need reminding: education /prepares/ you for the rest of your life (by offering you a plethora of more or less basic concepts to use during the remainder of your (professional) life. Education does /not/ provide you with all the knowledge you're going to need then. In other words: [higher] education should achieve three things: (a) teach you the basics (in CS101 and onward), (b) train you to grok and maybe even develop new concepts, most of whom are building on those basic blocks you were taught at school, and (c) the awareness that your education does not stop once you receive that diploma or title, but that you should /seek/ out new information and attempt to comprehend it so you can employ it your own work. ** I'm 41, and my personal impression is that education achieves (a) completely (or you'd fail the tests), more or less achieves (b) and your personal attitude is a strong determinant for the rate of success regarding (c). /Lack/ of sustained curiousity kills the cat, at long term. ** The point is made before, more or less eloquently, that you show, as happens with so many CS/IT professionals, to have neglected to stay abreast of current research in your field. For those that moan here about information accessibility; CiteSeer and other sites have a very large number of freely downloadable papers, which report on cache conscious, cache oblivious and other hardware architecture aware algorithms, which expand on the work of Donald Knuth and others. This may read as harsh criticism; if you experience it like that, I'm sorry but so be it. I like the direct approach you took in your text; I hope I got this response down in kind, as it isn't much of a criticism but rather a 'hear! hear!' with a few side notes. I am glad this was written, in the way it was done. Nothing wrong with that. ** The criticism is this: The basic architectural model in fig. 7 is not outdated and still a good training model for /CS101/. /CS201/ (or whichever comes after CS101) would then discuss the shortcomings of said model and expand on it, ending up with a model where CPU caches (L1, L2, L3, ...), [D]RAM and HDD storage, their use and their characteristics become part of the picture, leading to a reconsideration of the previously taught algorithms in light of the 'new' model and the impact on scale(N) vs. performance (O() and C). That's where the cache-conscious algorithms (like your interesting reinvention of the B-tree-similar binary heap), 'cache oblivious', etc. ones show on the radar. A simple 'googling' for cache conscious binary heaps will deliver several papers more or less on subject; the key here is that you need to know that phrase 'cache conscious' in order to quickly find them; that's what the 'continued interest' a.k.a. education goal (c) should have delivered to you at this or an earlier time. Staying abreast doesn't mean leafing through your favorite IT magazines is good enough. That's the whole bloody difference between education and higher education; you're expected to be able to find and read published papers. That ability will help you to consider new technology for your projects, as it appears [publicly]. Or at least the choice to use or /not/ use it, given development time&cost restraints vs. performance requirements. [...continued]Neil Conway | Thu, 17 Jun 2010 07:46:58 UTC
Pinco | Thu, 17 Jun 2010 10:49:56 UTC
Pinco | Thu, 17 Jun 2010 11:05:50 UTC
Ashwin Jayaprakash | Fri, 18 Jun 2010 06:38:29 UTC
xxx | Fri, 18 Jun 2010 11:47:17 UTC
Poul-Henning Kamp | Fri, 18 Jun 2010 11:57:16 UTC
xxx | Fri, 18 Jun 2010 12:02:59 UTC
James Cook | Sat, 19 Jun 2010 07:06:27 UTC
James Cook | Sat, 19 Jun 2010 07:08:56 UTC
Pinco Pallino | Sat, 19 Jun 2010 09:41:48 UTC
Robert | Sat, 19 Jun 2010 12:25:07 UTC
Michael Eager | Mon, 21 Jun 2010 04:41:19 UTC
Cellar | Sun, 27 Jun 2010 17:49:32 UTC
Wayne | Sat, 10 Jul 2010 19:58:23 UTC
Kashyap | Wed, 04 Aug 2010 07:50:41 UTC
James | Fri, 29 Oct 2010 10:57:47 UTC
Gene Ressler | Tue, 28 Dec 2010 17:44:03 UTC
This work is good, correct, and clever. But the criticism of CS as a discipline is uninformed and self-serving ("look at me; I'm smarter than Knuth"). In the CS education that I was privileged to receive, first at the undergrad and then again in graduate courses spread across 15 years or so (not contiguous), I was exposed to the importance of memory hierarchies in algorithm design several times in different courses. As has been mentioned, the CS literature is full of both theoretical and practical approaches to hierarchy-aware algorithm design. One quickly Googled collection is http://books.google.com/books?id=8B1Mx6dJOWsC&lpg=PA194&ots=2oGWbMphtl&dq=memory%20hierarchy%20algorithm%20design&pg=PA75#v=onepage&q=memory%20hierarchy%20algorithm%20design&f=false If no one happened to focus specifically on priority queues, okay, it's terrific that someone got this out of the way. The approach and order-of-maginutude speedup are commonplace. The last I recall reading about was in garbage collection. There are probably many others. (The cite above talks about graph search, for example.) If there is a valid criticism here, it's that CS reference works and textbooks don't keep up very well with details of the art such as this. Rank-and-file teachers who know only the texts tend to deliver less than optimal instruction for this reason. No surprise there. Writing references is an incredible amount of work, not much valued as scholarly activity for academic types because it doesn't generate new knowledge or pull in research dollars, and it pays very poor royalties. I've known people who accuse Knuth of "wasting his talent" by producing his volumes, not to mention a typesetting system to publish them! This author could begin to right the wrong he perceives by devoting himself to writing a reference rather than stopping with a snarky one-off blog post.Tom | Wed, 05 Jan 2011 19:56:36 UTC
Sepp | Fri, 25 Mar 2011 13:34:37 UTC
Dutch Uncle | Thu, 30 Jun 2011 16:02:25 UTC
iitg | Mon, 12 Dec 2011 09:44:22 UTC