December 2, 2024
Volume 22, issue 5

Download PDF version of this article PDF

You Don't Know Jack About AI

And ChatGPT probably doesn't either

Sonja Johnson-Yu and Sanket Shah

For a long time, it was hard to pin down what exactly AI was. A few years back, such discussions would devolve into hours-long sessions of sketching out Venn diagrams and trying to map out the different subfields of AI. Fast-forward to 2024, and we all now know exactly what AI is. AI = ChatGPT.

Or not.

For most people, AI doesn't really have a definition. Like the U.S. Supreme Court's definition of pornography, it's more, "You know it when you see it." And this "eyeballing" intuition remarkably recapitulates the logic put forth by Alan Turing in his 1950 article, "Computing Machinery and Intelligence," where he proposes "the imitation game" in which a human judge attempts to discriminate between a human and a computer on the basis of their responses to the judge's questions. This is a behavioral test for intelligence commonly known as the Turing test ("if it looks like human intelligence, then it is intelligence").

There are several ways to break the Turing test—for example, by creating a huge lookup table with all possible questions and all possible answers, and then fishing out one of those answers when the question is asked. This algorithm (if you can call it that) is a simple GET query, yet it exhibits the intelligence of the humans involved in the creation of the dataset. ChatGPT isn't exactly creating a huge lookup table, but it is similarly parroting back the work of human intelligence.

Remind Me, How Does ChatGPT Work?

Imagine a seasoned chef. During her training, she encounters thousands of recipes and cooks countless dishes, learning what exactly ratatouille is and developing an intuition about what ingredients go well together. Now, when you ask the chef to prepare ratatouille with a Sichuan flavor profile, she can draw on her experience to create a dish that suits your taste. The chef can also improve by preparing variants of the same dish and then asking for feedback about which one is best.

Similarly, ChatGPT is a neural network trained by looking at trillions of articles. While going through this data, the language model learns which words tend to follow others ("Once upon a..." is usually followed by "time"), much like the chef learning that a combination of sesame oil and white pepper is likely to be complemented by soy sauce. The model learns to prepare a customized answer by producing several responses to a given prompt and receiving feedback from a human, who tells the model which responses are better than the others. This process is repeated over and over again until the model can reliably produce good customized responses to most prompts.

This all sounds fine, but if we're going to be dealing with such big datasets anyway, why spend the GDP of a small island nation to train a model instead of simply using the GET query to look up the answer? The issue with the GET-query approach has to do with creating a database that contains the answers to all the possible questions a user might ask. This is especially hard given that even small changes to the query might necessitate a whole new set of answers (e.g., "How do you bake a cake?" vs. "How do you bake a cake outdoors?"). As a result, this approach requires you either to restrict the set of allowable questions or to create a database with all the answers to Life, the Universe, and Everything.

The key challenge, then, is to find a way to compress the mind-bogglingly large space of questions and answers. One possibility is to develop a system that uses a set of rules to parse and interpret user questions, such as the expert systems developed in the 1970s. For example, "How do you bake a cake outdoors?" would be broken down into the action "bake," the object "cake," and "outdoors" (which might trigger a rule related to outdoor equipment). This type of system could provide specific and accurate responses, but it would need constant updating to deal with new queries. Clearly, this isn't a scalable solution. We need to find another way to extract knowledge that doesn't require writing down every rule by hand. What if there was a way of automatically learning rules from the data already at our disposal?

Finding Patterns

The real power of AI lies in pattern identification—that is, finding patterns (often unseen) in large volumes of data. This is a broad approach since patterns can relate to the "likelihood of the next word" learned by language models, or they can be filters capable of identifying ears and snouts in image data. To answer questions such as "How many cans of Coca-Cola is it OK to drink in a day?", "Is this car insurance claim for $1 billion an anomaly?", or "Does this image need a fig leaf to keep things PG?", the AI system must have a systematic way to extract meaningful insights from the data available.

For this reason, modern AI is best characterized as representation learning, where the goal is to identify a set of "features" that captures the most important patterns and variations in the data. This is like finding a way to convert an hour-long cooking video into the 10-line recipe you actually need. Instead of a summary written in words, however, an AI summarization will be packed into an array of numbers that capture some semantic information about the data, which we call a feature representation.

One way to summarize the data is by finding "building blocks" that can be combined to describe the dataset. This frequently is done in data science by way of PCA (principal component analysis). PCA finds the most important "directions" in which a dataset varies and then uses these directions to reframe the data. It's like tasting a variety of foods to identify the key characteristics by which the foods vary, such as sweetness, spiciness, and texture. By calling out these elements, PCA provides us with a vocabulary for describing how certain foods are similar or different.

This approach works deliciously when the patterns in the data are simple enough, such as when there's a linear relationship between the inputs and the components. Sometimes, however, the patterns are not easily extractable from the data. For example, in image recognition, there are complex, nonlinear relationships between pixels and objects that are difficult to distill.

This is where more advanced representation-learning techniques come into play. Methods such as deep learning with neural networks aim to capture the complex patterns in these datasets. Often these methods learn to capture patterns by learning a representation (i.e., a "transformation" of the data) that is useful for a particular task.

The way methods learn this representation is typically through trial and error. Imagine you want to be able to taste the difference between 1 percent milk and 2 percent milk. You might blindly pour a glass of milk, make a guess—based on richness and sweetness, say—as to whether it's 1 percent or 2 percent, and then look at the milk carton to see if you are correct. Then you could repeat this many times over and refine your representation. With enough taste testing and a strong lactose tolerance, you should gain the ability to distinguish between 1 percent and 2 percent milk, likely by way of a heightened awareness of milk mouthfeel.

This is essentially how neural networks are trained. We might want a neural network to be able to perform a certain task, such as distinguishing between pictures of dogs and cats. Through trial and error, the neural network learns to find patterns that set the two animals apart. For example, the network might learn to identify the shape of the ears, with cat ears often being more pointy and dog ears being more floppy. No one tells the model ahead of time to look at the ears. Yet, through trial and error, the model will start to identify these as distinguishing features. Then, by adjusting its internal representation over time to minimize prediction errors, the neural network will refine its ability to differentiate between the two animals, much as you did with all those milk taste tests.

The type of representation learned will be specific to the task that the model has been trained to perform. If the task is super specific (kind of like tasting the difference between 1 percent and 2 percent milk), the learned representation will probably be good for accomplishing that task—but not necessarily for doing other things (e.g., models that are trained to distinguish between cats and dogs may not be able to tell you whether a given cat is happy or sad). Still, while models sometimes don't transfer well between specific tasks, there are certain tasks that are general enough that a representation can be learned, which proves to be useful for many different things. For example, next-word prediction, which is at the heart of LLMs (large language models), is an incredibly broad task:

"A grown-up kitten is a..." "cat"
"Nine plus thirty-three is...." "forty-two"
"The U.S.'s next president is most likely to be..." ???

Training large models on the task of next-word prediction using all the text on the Internet allows LLMs to learn incredibly general representations that can be flexibly used for a variety of tasks. In lieu of a more concrete definition of artificial intelligence, we believe that representation learning might serve as a more specific concept that's central to the current AI hype cycle.

Main Takeaway #1: Right now, it seems like just about everything is AI, but no one has a satisfying answer to what AI actually is. I hope the next time you think about AI, the first thing that comes to mind is representations, and not ChatGPT.

So... representations are all you need, right? Unfortunately, not necessarily.

Alignment

Things are fine and dandy when all you want to do is to sort cat images into "cute," "grumpy," or "sleepy." However, these patterns are also being used to accomplish all sorts of other important tasks such as determining bail or loan worthiness. In these instances, the consequences of being wrong are far more dire than what comes from posting the wrong cat meme. This brings us to a crucial question in these high-stakes scenarios: What makes a model "good"?

From a computer's perspective, a model is good if it tends to perform closely to the ideal behavior for some given task. Specifically, the model is trained to minimize a loss function, also known as an objective function. This function guides the model's learning, directing it to make adjustments that reduce errors and improve performance. And it's up to humans to design the right objective.

Defining this objective can be a nuanced matter. When evaluating images of possible skin cancer, is it better to err on the side of caution, flagging borderline images as "potentially cancerous," or to err on the lax side? False positives and false negatives have different implications in different scenarios, so the objective function needs to reflect these tradeoffs appropriately.

Defining good performance becomes even harder whenever the task is more complex. Say a social media giant is trying to increase engagement on its platform, so it trains an algorithm to maximize clicks on articles. Fast-forwarding, you might see high traffic on sensationalist links ("Unseen Footage: Alien Life Confirmed by NASA?"), followed by a general erosion of engagement caused by the propagation of crappy advice. This might indicate that pushing for increasing user engagement at all costs might not be the best objective function (cough, cough... targeted advertising).

So, how do you make sure your model is performing the way you want it to? This is what people mean by alignment, referring to how your goals and your model's outcomes align. Right now, mole-whacking is the most commonly employed strategy for aligning our algorithms—meaning, figure out where errors typically occur and then try to patch that problem. In the social media case, keeping track of metrics on long-term user satisfaction and content quality may have raised flags that the "maximize clicks" algorithm had some issues. In finance applications this might involve audits into whether your found patterns have similar error rates for different demographic groups, followed by an update to the model with a new objective. In ChatGPT this happens by way of RLHF (reinforcement learning with human feedback), which is to say the new sweatshop labor where people rank the values of different ChatGPT outputs.

However, these all amount to Band-Aids rather than systematic solutions. One potential direction toward aligning our models lies in understanding the patterns these models have gleaned. If we are able to interpret the patterns learned by AI and describe them clearly, we can then engage in discourse to decide whether these patterns are indeed aligned with the desired behavior.

Understanding Patterns

Imagine creating a dataset by launching differently weighted rubber ducks with a slingshot and then measuring how quickly they accelerate through the air. From that, an AI model can be created that will accurately predict the acceleration of a 100kg rubber duck (but I don't want to be the one who conducts that experiment!). The fact that we have an accurate prediction model, however, doesn't mean we have a true understanding of physics. What we want to be able to learn from such a model is Newton's second law of motion.

Central to this is understanding what "representations" the model has learned. While it's most intuitive to think about representations in terms of concepts such as "the mouthfeel of 1 percent milk," these representations are actually encoded as extremely large matrices of numbers. The matrices transform the input into features (i.e., other numbers) that assist in making predictions. Usually these are not easily understandable to humans, but they do somehow help the model accomplish the task. As a result, while we have a high-level behavioral understanding of the model's behavior ("it predicts the correct thing 92 percent of the time) and a low-level mechanistic understanding of what matrices a model is learning, we lack an explanation of the model's behavior at the appropriate level of abstraction, which lies somewhere in between.

Nailing this is a tremendously difficult task, and one that remains largely unsolved. As of today, there are a variety of techniques aimed at interpreting models, but none that can help us understand the second law of motion, let alone more complex systems.

So far, we've assumed that training large models on the Internet will allow us to predict everything we want. But what happens when we don't have enough data to find the patterns we're looking for?

The Importance of Data

The dominant AI paradigm is to "find patterns in a lot of data," but in areas where data is scarce we can't expect a silver bullet. The Internet can't eradicate malaria, for example. This mammoth task would require meticulous experimentation and data collection by natural and social scientists. Ironically, the painstaking procurement of data has long been undervalued by AI, a field that eats data for dinner. AI can aid in these scientific efforts, but only so much.

Moreover, intentionally gathering data is hard, and you get what you pay for. Carefully collected data is higher quality than incidentally collected data, though the latter is the workhorse of most large AI systems. This incidental data (sometimes referred to as "data exhaust"), which supplements location data, search queries, and social media interactions, comes as a byproduct of general online activity. AI actually can use this unintentionally created data quite effectively, but there can be consequences, such as copyright infringements or privacy violations. Conversely, sometimes a lack of data exhaust—as in "low-resource" languages such as Twi or Khmer, for example—can lead to less-capable models in these languages.

Learning patterns from data is important if AI is to reach human-level performance. But how do we move beyond simply copying existing behaviors?

Beyond Patterns

At the moment, people are working hard to create AI systems that take a "requirements spec" as input and then spit out the appropriate code. What such systems will not be able to do, however, is to create the requirements doc in the first place. If you ask current AI to do that, you'll likely end up with a generic cookie-cutter spec that doesn't actually capture your use case or reason about your design tradeoffs. The ultimate goal is to be able to use AI to make smarter decisions. To do this, however, we'll first need to figure out what needs to be done.

Making smart decisions often requires context that is currently unavailable to AI systems. How to break a big problem into smaller chunks, how to prioritize tasks effectively, how to manage delays in a project... answering questions such as these requires context on personnel, resources, and organizational goals. Often, the nuances of this context are not captured in available data. We want to build systems that embrace the principle of "know what you don't know" and explicitly encode uncertainty and limitations—something that AI systems are currently unable to do.

Moreover, as people begin to rely more heavily on AI systems for decision-making, weird things can happen. For example, if everyone takes the "fastest route" on Google Maps, then the "fastest route" is no longer the fastest. The future is a function of both your actions and others' actions, and if everyone relies more heavily on the same AI decision-making tools, then the optimal strategy will shift. Current AI systems do not account for these sorts of downstream effects.

To summarize, for all the doomsday proselytizing we hear, the sci-fi image of AI is far removed from the current reality. AI is not going to take your job anytime soon unless your role is devoid of human judgment calls. The AGI (artificial general intelligence) that equals human intelligence is still a faraway mythical milestone. There's still so much work to be done on defining our objectives, understanding just what the heck is going on in these crazy machines we've built, and getting past the idea of thinking about AI as prediction machines.

Main Takeaway #2: AI can do things but only as well as we can define the goals we want our AI to achieve. And right now, figuring out "what needs to be done" is best left to the humans.

Advisory: this article was mostly human-written^TM, with no more than 15 percent of the content having been produced by ChatGPT.

Translation Guide for Talking to LLM Bros

Machine learning — Field of study concerned with learning patterns from data. Really just statistics with a facelift and now rebranded as "AI."

Reinforcement learning — Focused on training "agents" to accomplish certain tasks by rewarding them, much like you would train your dog to bite the postman.

Deep learning — Machine Learning, but with bigger models and a sexy new name.

Transformers — The hot new architecture for sequential data that made language models wildly successful. Now the favorite toy of every '90s kid.

Narrow AI — AI that works for a problem of interest (e.g., image captioning). This is already beyond the current scope of what models can do.

AGI (artificial general intelligence) — AI that generalizes across tasks, on the level of human intelligence. The Silicon Valley pipe dream.

ASI (artificial super intelligence) — Think Skynet, the antagonistic Terminator film franchise terror. It's apparently just around the corner from AGI. Beyond human intelligence!

Compute — Amount of time and resources used to train a model. Usually measured in the number of FLOPs (floating point operations) or the number of Nvidia GPUs you burned through.

Soylent — A "meal in a bottle" to maximize the time you spend coding and minimize the time you spend enjoying life. Also a measure of engineering effort.

AI safety — Subfield that studies how to prevent the robots from taking over.

Alignment — Ensuring that the AI system's goals are aligned with our intentions, so that it's less Skynet and more Wall-E.

Mechanistic interpretability — Trying to reverse engineer which features neural networks learn (to figure out what the heck they're thinking).

Singularity — A hypothetical future point where tech/AI growth becomes uncontrollable. This is what leads to ASI, and then, eventually, doom.

Sonja Johnson-Yu is a fourth-year computer science PhD student at Harvard, advised by Kanaka Rajan. She uses a cocktail of reinforcement learning and other machine learning techniques to study how animals use signals to achieve their goals and coordinate group behaviors.

Sanket Shah is a fifth-year PhD student at Harvard University. His research focuses on Decision-Focused Learning, a way to improve the performance of machine learning models in algorithmic decision-making by tailoring them to their downstream decision-making use cases.

Originally published in Queue vol. 22, no. 5—
Comment on this article in the ACM Digital Library