The Kollected Kode Vicious

Kode Vicious - @kode_vicious

  Download PDF version of this article PDF

Kode Vicious

When Should a Black Box Be Transparent?

When is a replacement not a replacement?

Dear KV,

We've been working with a third-party vendor that supplies a critical component of one of our systems. Because of supply chain issues, they are trying to "upgrade" us to a newer version of this component, and they say it's a drop-in replacement for the old one. They keep saying that this component should be seen as a black box, but in our testing, we found many differences between the original and the updated part. These aren't just simple bugs but significant technology changes that underlie the system. It would be nice to treat this component as a drop-in replacement and not worry my pretty little head about this, but what I've seen thus far doesn't inspire confidence. I do see their point that the API is the same, but I somehow don't think this is sufficient. When is a component truly drop-in and when should I be more paranoid?

Dropped In and Out

Dear Dropped,

Your letter brings up two thoughts: one about current events and one about the eternal question of, "When should a black box be transparent?"

While we all know that the pandemic has caused incredible amounts of death and destruction to the planet, and the past two years have brought unprecedented attention on the formerly very boring area of supply chains, the sun comes up and the world still spins—which is to say that the world has not ended, yet. Honestly, if it did, it would be a nice break for me. Supply chain issues are both real and the world's latest excuse for everything. If I had kids (and let's all be thankful that I do not) I would expect them to be telling their teachers, "The supply chain ate my homework."

At this point, KV is quite skeptical when a vendor's first excuse is supply chain issues. Of course, that skepticism won't help unless you have a second supplier for whatever you're buying, which you can use to bludgeon your errant vendor.

The eternal question of, "When is a replacement not a replacement?" is one that will plague us in technology forever. The number of people who believe they can treat whatever they're providing as an opaque box with a fixed API is, unfortunately, legion. This belief comes from the physical world, in which a box is a box, and a brick is a brick, and why would you care if your brick is made from a different material anyway?

Here you see the problem: The metaphor breaks down in the physical world as quickly as it would in the realm of software and hardware. Two bricks may both be red, and therefore present an identical look and feel to the external user, but if they're made of different materials, then they have different qualities—for example, in strength, but let's also consider something less obvious, like their weight. The number of bricks that can be stacked on top of each other to build a wall depends on their weight, as well as their strength. If you use heavy but weak bricks, well, you can imagine how this goes, and if you can't, try it—just don't tell your health insurance plan that KV suggested this. And let's say you don't build the wall out of weak and heavy bricks, but years later you replace some damaged bricks with newer, heavier, and weaker bricks. The key here is you wouldn't want to stand near that wall.

A topic KV keeps coming back to, one that may be driving him to drink, is the malleability of software. I keep coming back to this because it is this malleability that often results in the catastrophic failures of software and systems engineering. You mentioned that you saw timing problems with the new component. I can imagine few situations more treacherous than a change in the timing of a critical component. Timing bugs are already some of the hardest to track down and fix, and if the timing is off in a critical component, that's likely to affect the system, so good luck debugging that. May I recommend three measures of gin, one of vodka, a splash of Kina Lillet, shaken over ice, with a slice of lemon? You'll thank me, as you'll be saying evening prayers from now until your ship date slips into infinity. Those who wish to stand on the "API as a contract" quicksand are welcome to do so, but I'm not about to throw them a rope.

The right answer in these cases is to ask the vendor for as much information as possible to reduce the risk in accepting this so-called replacement. First, ask for the test plans and test output so you can understand whether they tested the component in a way that relates to your use case. Just because they tested the thing doesn't mean they tested all the parts your product cares about. In fact, it's unlikely they did. They may have tested just the parts that connect back to the API, rather than the edge cases that would come up when a component is changed in your system.

Second, ask for a complete readout of the differences between the old and new parts. For hardware, this means the underlying technology (e.g., the old part was 90nm and the new one is 45nm), and any voltage changes, as well as the internals. I've seen replacement parts that put whole CPU cores into what were once fixed-function pieces of digital electronics, which is utterly insane, but someone, somewhere, is getting praised for adding "flexibility" to the product rather than being beaten with a rubber truncheon for increasing risk.

Lastly, make sure you have a second supplier for any component you deem critical. This ought to go without saying, but, since I'm saying it, that means you know that's been an issue for a lot of people I've seen looking like the walking wounded after an upgrade completely destroyed their product.

Oh, and you did ask when to be paranoid. I mean, clearly the answer is, always.

KV

Kode Vicious, known to mere mortals as George V. Neville-Neil, works on networking and operating system code for fun and profit. He also teaches courses on various subjects related to programming. His areas of interest are computer security, operating systems, networking, time protocols, and the care and feeding of large code bases. He is the author of The Kollected Kode Vicious and co-author with Marshall Kirk McKusick and Robert N. M. Watson of The Design and Implementation of the FreeBSD Operating System. Since 2014 he has been in Industrial Visitor at the University of Cambridge where he is involved in several projects relating to computer security. He earned his bachelor's degree in computer science at Northeastern University in Boston, Massachusetts, and is a member of ACM, the Usenix Association, and IEEE. His software not only runs on Earth but has been deployed, as part of VxWorks in NASA's missions to Mars. He is an avid bicyclist and traveler who currently lives in New York City.

Copyright © 2022 held by owner/author. Publication rights licensed to ACM.

acmqueue

Originally published in Queue vol. 20, no. 2
Comment on this article in the ACM Digital Library





More related articles:

Gobikrishna Dhanuskodi, Sudeshna Guha, Vidhya Krishnan, Aruna Manjunatha, Michael O'Connor, Rob Nertney, Phil Rogers - Creating the First Confidential GPUs
Today's datacenter GPU has a long and storied 3D graphics heritage. In the 1990s, graphics chips for PCs and consoles had fixed pipelines for geometry, rasterization, and pixels using integer and fixed-point arithmetic. In 1999, NVIDIA invented the modern GPU, which put a set of programmable cores at the heart of the chip, enabling rich 3D scene generation with great efficiency.


Antoine Delignat-Lavaud, Cédric Fournet, Kapil Vaswani, Sylvan Clebsch, Maik Riechert, Manuel Costa, Mark Russinovich - Why Should I Trust Your Code?
For Confidential Computing to become ubiquitous in the cloud, in the same way that HTTPS became the default for networking, a different, more flexible approach is needed. Although there is no guarantee that every malicious code behavior will be caught upfront, precise auditability can be guaranteed: Anyone who suspects that trust has been broken by a confidential service should be able to audit any part of its attested code base, including all updates, dependencies, policies, and tools. To achieve this, we propose an architecture to track code provenance and to hold code providers accountable. At its core, a new Code Transparency Service (CTS) maintains a public, append-only ledger that records all code deployed for confidential services.


David Kaplan - Hardware VM Isolation in the Cloud
Confidential computing is a security model that fits well with the public cloud. It enables customers to rent VMs while enjoying hardware-based isolation that ensures that a cloud provider cannot purposefully or accidentally see or corrupt their data. SEV-SNP was the first commercially available x86 technology to offer VM isolation for the cloud and is deployed in Microsoft Azure, AWS, and Google Cloud. As confidential computing technologies such as SEV-SNP develop, confidential computing is likely to simply become the default trust model for the cloud.


Mark Russinovich - Confidential Computing: Elevating Cloud Security and Privacy
Confidential Computing (CC) fundamentally improves our security posture by drastically reducing the attack surface of systems. While traditional systems encrypt data at rest and in transit, CC extends this protection to data in use. It provides a novel, clearly defined security boundary, isolating sensitive data within trusted execution environments during computation. This means services can be designed that segment data based on least-privilege access principles, while all other code in the system sees only encrypted data. Crucially, the isolation is rooted in novel hardware primitives, effectively rendering even the cloud-hosting infrastructure and its administrators incapable of accessing the data.





© ACM, Inc. All Rights Reserved.