Dear KV,
After more than a year of hearing people talk about AI and co-pilots, I finally tried one on a small project. I even paid for the privilege of doing so, figuring that the paid version would be superior to the free one. But what I have found confuses me, and I'm wondering if you too have tried any of these tools. From your previous articles, it seems you might not be focused on the latest tools in our industry. So, maybe you've just continued to use vim and Makefiles. Have you tried these things, and do you have any words of wisdom for the rest of us who are looking at them now?
Co-Piloted
Dear Co-Piloted,
It may shock KV's readers to learn I'm a bit of a tools dweeb, and in fact whenever some new tool comes out that supposedly will help me to create or understand software better, I have been willing to try it. This applies not just to tools but also techniques. I'm even a certified Scrum Master, but that's a story for another time.
I have tried many editors, several IDEs, various debuggers, and all manner of new and interesting tools in my career, and continue to do so. Like you, I had held off on trying the tools based on LLMs and even now continue to throw up in my mouth whenever I'm forced, in conversation, to refer to these as AI. Being able to spit out passable marketing doggerel has about as much in common with intelligence as does an American Presidential election. In fact, those two are clearly, deeply related.
Before trying to use these tools, you need to understand what they do, at least on the surface, since even their creators freely admit they do not understand how they work deep down in the bowels of all the statistics and text that have been scraped from the current Internet. The trick of an LLM is to use a little randomness and a lot of text to Gauss the next word in a sentence. Seems kind of trivial, really, and certainly not a measure of intelligence that anyone who understands the term might use. But it's a clever trick and does have some applications.
If you're typing a suicide note, for example (and the corpus of text contains thousands of these), it's quite possible the code will be able to guess which word you might use next since thousands of people who came?and went?before you typed it as well.
Code is a an even more constrained environment than prose, in a way, because code must be run through a process that has a strict syntax?one that's far stricter than any human language. It's thought this narrowness facilitates the process of guiding the creation of code, with templating being cited as an early use case for these technologies. And who would not want help with such drudgery? Many pieces of code that are written, especially for the visual web, are just copy-and-pasted versions of other pages, and the same might be said for other areas of coding.
While help with proper code syntax is a boon to productivity (consider IDEs that highlight syntactical errors before you find them via a compilation), it is a far cry from SEMANTIC knowledge of a piece of code. Note that it is semantic knowledge that allows you to create correct programs, where correctness means the code actually does what the developer originally intended. KV can show many examples of programs that are syntactically?but not semantically?correct. In fact, this is the root of nearly every security problem in deployed software. Semantics remains far beyond the abilities of the current AI fad, as is evidenced by the number of developers who are now turning down these technologies for their own work.
Guessing the next word used by a cohort of morons, which is what co-pilots actually do, leads to incredibly incisive text such as
server.mtx.Lock() // Lock the cache
Yes, thank you, that's the mutex Lock method. But WHY do we lock the cache? And what do we do about it later? This is akin to a comment such as
i++ // Increase i by 1
The only reason anyone is impressed by this is that it's written in a form that is more palatable to those who wish to anthropomorphize their machines, something Dijkstra warned about in the 1960s.
Another classic from our new Robot Master:
// Get retrieves the value for a given key if it exists and is not expired.
// Parameters:
// - ctx: context for the request.
// - request: contains the key to retrieve.
Wow! Really? If a cursory glance at the code wasn't enough to tell me this, I shouldn't be here at all.
Finally, my favorite feature of co-pilot programs is the abject plagiarism. We already know that the text and code being typed out by these things comes from scanning billions of lines of text and source code available in GitHub, but they can even be helpful in unintended ways. A colleague who was taking a night class in distributed systems showed me what happened when his professors (foolishly) suggested the students "use the new tools" in order to become more modern developers. As he accepted more and more of the co-pilot's suggestions, he noticed a pattern: It was as if someone was typing in another file from somewhere else. The coding style itself was one of the clues, but eventually the co-pilot gave itself away completely by saying, "You know there is a file just like this over in this other repo?" In a way, this makes sense. But as part of a homework exercise, it's just hilarious.
The more I've used these tools in my projects, the more I've realized that co-pilots are nothing more than drunken plagiarists, sitting behind you and your code with their hot, gin-soaked breath whispering semantic nothings in your ear. They are not a boon to your work, they are a rubber crutch?one that will cruelly let you down when you need it most. Now, we all just need to get real work done while we wait for this latest hype cycle to die a justified and fiery death.
KV
George V. Neville-Neil works on networking and operating-system code for fun and profit. He also teaches courses on various subjects related to programming. His areas of interest are computer security, operating systems, networking, time protocols, and the care and feeding of large codebases. He is the author of The Kollected Kode Vicious and co-author with Marshall Kirk McKusick and Robert N. M. Watson of The Design and Implementation of the FreeBSD Operating System. For nearly 20 years, he has been the columnist better known as Kode Vicious. Since 2014, he has been an industrial visitor at the University of Cambridge, where he is involved in several projects relating to computer security. He earned his bachelor's degree in computer science at Northeastern University in Boston, Massachusetts, and is a member of ACM, the Usenix Association, and IEEE. His software not only runs on Earth, but also has been deployed as part of VxWorks in NASA's missions to Mars. He is an avid bicyclist and traveler who currently lives in New York City.
Copyright © 2024 held by owner/author. Publication rights licensed to ACM.
Originally published in Queue vol. 22, no. 6—
Comment on this article in the ACM Digital Library
Mark Russinovich, Ahmed Salem, Santiago Zanella-Béguelin, Yonatan Zunger - The Price of Intelligence
The vulnerability of LLMs to hallucination, prompt injection, and jailbreaks poses a significant but surmountable challenge to their widespread adoption and responsible use. We have argued that these problems are inherent, certainly in the present generation of models and likely in LLMs per se, and so our approach can never be based on eliminating them; rather, we should apply strategies of "defense in depth" to mitigate them, and when building and using these systems, do so on the assumption that they will sometimes fail in these directions.
Sonja Johnson-Yu, Sanket Shah - You Don't Know Jack About AI
For a long time, it was hard to pin down what exactly AI was. A few years back, such discussions would devolve into hours-long sessions of sketching out Venn diagrams and trying to map out the different subfields of AI. Fast-forward to 2024, and we all now know exactly what AI is. AI = ChatGPT. Or not.
Jim Waldo, Soline Boussard - GPTs and Hallucination
The findings in this experiment support the hypothesis that GPTs based on LLMs perform well on prompts that are more popular and have reached a general consensus yet struggle on controversial topics or topics with limited data. The variability in the applications's responses underscores that the models depend on the quantity and quality of their training data, paralleling the system of crowdsourcing that relies on diverse and credible contributions. Thus, while GPTs can serve as useful tools for many mundane tasks, their engagement with obscure and polarized topics should be interpreted with caution.
Erik Meijer - Virtual Machinations: Using Large Language Models as Neural Computers
We explore how Large Language Models (LLMs) can function not just as databases, but as dynamic, end-user programmable neural computers. The native programming language for this neural computer is a Logic Programming-inspired declarative language that formalizes and externalizes the chain-of-thought reasoning as it might happen inside a large language model.