The Kollected Kode Vicious

Kode Vicious - @kode_vicious

  Download PDF version of this article PDF

Sizing your System

A koder with attitude, KV answers your questions. Miss Manners he ain’t.

Dear KV,

I’m working on a network server that gets into the situation you called livelock in a previous response to a letter (Queue May/June 2008). Our problem is that our system has only a fixed amount of memory to receive network data, but the system is frequently overwhelmed and can’t make progress. When I ask our application engineers about how much data they expect, the only answer I get is “a lot,” which isn’t much help. How can I figure out how to size our systems appropriately?

Memory Not Unlimited

Dear Memory,

Wait, doesn’t your company roll out all your servers with a minimum of four or eight gig of RAM? Doesn’t everyone do that now? How can you be out of memory? I just do not understand; it is all too much for my little brain to comprehend.

Actually, it’s not too hard for me to comprehend, though there are days when I consider going into my favorite bar until the only thing I can comprehend is that the big bright ball in the sky means it’s time not to go home but to work. Avoid KV on days when he goes from the bar to work, trust me.

There are ways to handle people who don’t want to size their applications properly, but most of them are not allowed under the Geneva Conventions, even if you have a hall pass from an administration official. You can, of course, trick the application engineers. It turns out that these tricks are still allowed under international law.

My favorite legally usable trick is basic recording. Your systems surely have a way to record the overall usage of resources by the operating system and applications. Run the application in a lab with a small amount of memory—perhaps 512 megabytes—and see when it crashes. Double the memory, try again. Each time, record the usage pattern and see if something jumps out at you. Does the application use memory up slowly or quickly? Perhaps there are spikes as a result of certain conditions.

Those spikes might actually be bugs. See if they are, and if so, report them and get the engineers to fix them. Maybe the application just has a memory leak but it runs for so long with the usually available four or more gigabytes of RAM that it takes forever to show up. Memory leaks are still bugs in KV’s book, so report, fix, etc. If you don’t have a lab, then make sure to do the recording on the live system. Recording on a live system has its own problems, however; in particular, it can affect system performance, so make sure to sample at infrequent intervals to prevent the sampling from taking too much time from the application.

The same advice holds true for any other resource an application uses. CPU, interrupt load, input/output, disk space, and all the rest are amenable to measurement. The only way to correct a problem is to understand it, and the best way to understand it is measurement.

Yes, it would help if people actually planned and knew how much they were using in the way of system resources, but the fact is that sometimes that’s not the case, and you can’t just throw up your hands—or just throw up—and walk away, much as you might want to.

KV

Dear KV,

I’m working for a banking firm on some of its larger trading applications. By law we’re required to record a whole slew of data, much of which we never use or see again. On some occasions I have actually had to go and find this data, but each and every time it requires a bit of programming to do so. I and others on my team seem to write these throw-away programs on a quarterly basis. Before you ask, yes, we store these programs in a source-code control system. It’s not that we lose the code, it’s that the data we need changes and the original system design did not account for getting directly at all the data being recorded. What’s the right way to make sure we can always get to the data we need?

Query on Queries

Dear QQ,

Let’s get one thing straight. There are no “throw-away” programs. I don’t mean that in the “there are no stupid questions” kind of way (in fact, there are many stupid questions). What I mean is that if you’re writing code that you intend to throw away, then you are specifically wasting your time, and the time of your team.

To your original point of a system that records data that it doesn’t expose: well, that I just don’t get. How do the engineers at your company even know if they are recording the data correctly? If this is for regulatory purposes, then what will happen when some auditor comes around and says, “Have you been recording all records of type X?” and then follows up with, “Well, then please show them to me.” It would seem that your company is following the letter but not the spirit of the law, and that can lead only to trouble. KV’s simple rules of data collection: if you don’t need it, don’t keep it; if you do need it, keep it safe, and keep it accessible. Keeping data around because you were told to, but for no other purpose, is like people who collect figurines. They give pleasure to only one or two slightly obsessive individuals and they are the first things to go in the trash after those individuals die.

KV

KODE VICIOUS, known to mere mortals as George V. Neville-Neil, works on networking and operating system code for fun and profit. He also teaches courses on various subjects related to programming. His areas of interest are code spelunking, operating systems, and rewriting your bad code (OK, maybe not that last one). He earned his bachelor’s degree in computer science at Northeastern University in Boston, Massachusetts, and is a member of ACM, the Usenix Association, and IEEE. He is an avid bicyclist and traveler who currently resides in New York City.

acmqueue

Originally published in Queue vol. 6, no. 4
Comment on this article in the ACM Digital Library





More related articles:

Geoffrey H. Cooper - Device Onboarding using FDO and the Untrusted Installer Model
Automatic onboarding of devices is an important technique to handle the increasing number of "edge" and IoT devices being installed. Onboarding of devices is different from most device-management functions because the device's trust transitions from the factory and supply chain to the target application. To speed the process with automatic onboarding, the trust relationship in the supply chain must be formalized in the device to allow the transition to be automated.


Brian Eaton, Jeff Stewart, Jon Tedesco, N. Cihan Tas - Distributed Latency Profiling through Critical Path Tracing
Low latency is an important feature for many Google applications such as Search, and latency-analysis tools play a critical role in sustaining low latency at scale. For complex distributed systems that include services that constantly evolve in functionality and data, keeping overall latency to a minimum is a challenging task. In large, real-world distributed systems, existing tools such as RPC telemetry, CPU profiling, and distributed tracing are valuable to understand the subcomponents of the overall system, but are insufficient to perform end-to-end latency analyses in practice.


David Crawshaw - Everything VPN is New Again
The VPN (virtual private network) is 24 years old. The concept was created for a radically different Internet from the one we know today. As the Internet grew and changed, so did VPN users and applications. The VPN had an awkward adolescence in the Internet of the 2000s, interacting poorly with other widely popular abstractions. In the past decade the Internet has changed again, and this new Internet offers new uses for VPNs. The development of a radically new protocol, WireGuard, provides a technology on which to build these new VPNs.


Yonatan Sompolinsky, Aviv Zohar - Bitcoin’s Underlying Incentives
Incentives are crucial for the Bitcoin protocol’s security and effectively drive its daily operation. Miners go to extreme lengths to maximize their revenue and often find creative ways to do so that are sometimes at odds with the protocol. Cryptocurrency protocols should be placed on stronger foundations of incentives. There are many areas left to improve, ranging from the very basics of mining rewards and how they interact with the consensus mechanism, through the rewards in mining pools, and all the way to the transaction fee market itself.





© ACM, Inc. All Rights Reserved.