Download PDF version of this article PDF

Echoes of Intelligence

Textual interpretation and large language models

Alvaro Videla

What counts is not what people actually know, but what people believe that everyone knows and which is thus taken as a common background.
— Patrizia Violi

The rising popularity of AI systems in the past few months is remarkable. While before, large language models (LLMs) were confined to being curiosities of AI labs or the talk of research papers, now companies have deployed these models in the public sphere, putting them front and center of public opinion in the form of various chat-like applications.

In many cases, users of these AI-powered applications are now greeted by an easy-to-use and amicable interface that allows a person to send prompts to the LLM and get a response back. The text produced by these recent models has an impressive quality compared with attempts of the past. In many cases, it's almost impossible to tell if the result was written by a human or by AI. This has led the tech sector, and the public in general, to speculate about possible uses of AI—from generating poetry, fiction, and restaurant recommendations, or even the extraction of questions and answers from a text corpus.

With all this in mind, it raises the question: How does an LLM work?

An LLM can function in a generic way as a "plausibility machine." It will consider some input sent by the user—the prompt—and will generate the text that most probably would follow from said input. (For a deep dive into the technical aspects of how an LLM works, see Stephen Wolfram's "What Is ChatGPT Doing... and Why Does It Work?"19)

As explained in the paper "On the Dangers of Stochastic Parrots" by Emily Bender, et al.,1 since the LLM is trained with signs alone, it cannot have a concept of understanding. Meaning is beyond words alone and their syntactical arrangement. Meaning is communal, produced and agreed upon by language users16—something to which the LLM has no access.

In the 2017 book Language in Our Brain, Angela Friederici explains:

 

"[...]for each word there are two types of semantic representations: a linguistic-semantic representation on the one hand and a conceptual-semantic representation on the other hand."10

 

She goes on to explain the richness of a conceptual-semantic representation compared with an underspecified linguistic one. Take, for example, the following sentence: He picked a rose. Having a linguistic representation of a rose—a certain type of flower—is more than enough to process the sentence on a linguistic level. On a conceptual-semantic level, however, the human brain can link the word rose to its aroma, to the rose in the centerpiece at the restaurant table on Valentine's Day, or to the pain felt as a kid when trying to grab one from a garden, unaware of the thorns. It's clear there's a second layer of meaning that goes beyond matching words to a dictionary.

Whereas modern apps demanded attention before, it can now be said that LLMs demand interpretation. When presented with information, people tend to try to assign it some meaning. This issue is presented by the Stochastic Parrots authors:

 

"[...] the tendency of human interlocutors to impute meaning where there is none can mislead both NLP [natural language processing] researchers and the general public into taking synthetic text as meaningful."1

 

What we want to understand here is where a human interlocutor imputes meaning to synthetic text. How does that happen? Let's echo literary theorist Terry Eagleton when he asks: What is involved in the act of reading? 5

 

Of Dogs and Escalators

In his book Literary Theory: An Introduction, Eagleton proposes the following situation: Imagine you see a sign in the London Underground system that says Dogs must be carried on the escalator. While the sentence might sound simple, Eagleton asks these questions about it:

• Does it mean that you must carry a dog on the escalator?

• Are you going to be banned from the escalator unless you find a stray dog to carry?

• Is "carried" meant to be taken metaphorically to help dogs get through life?

• How do you know this isn't a decoration?

Also, you are expected to understand this about the sign:

• The sign has been placed there by some authority.

• "Escalator" means this escalator and not some escalator in Paraguay.

• "Must be" means "must be now."

This example illustrates how a simple sentence lends itself to multiple interpretations. Humans understand the multiple codes that are in place to perform the correct reading of the sign: If you bring a dog to the London Underground, carry it while you use the escalator.

This brings us to the idea of the two levels of interpretation, as described by Umberto Eco in his book The Limits of Interpretation.8 (Since this article borrows from Eco's work, the key terms found there are italicized.)

On the first level, there's a semantic interpretation, which is the process in which, as readers go through the linear manifestation of the text, they fill it up with meaning. On the second level, there's a critical interpretation. Here, the goal is to describe, from a metalinguistic point of view, the reasons why a text produces a certain response among its readers.

Let's take a look at some of the codes used by readers to insert meaning into a text. (For the full detailed discussion, see The Role of the Reader, Introduction, section 0.6 Discursive structures.6)

 

The Role of the LLM Reader

In The Role of the Reader,6,9 Eco presents a framework explaining a series of codes used by readers as they transform the text expression into content. These codes are built on top of the text itself. English, with its dictionary and its syntactic rules, is but one example of a code. Traffic signals, with their red, yellow, and green lights, are another code used to signify who has the right of way at an intersection. The layout of a book—with a chapter title at the top of the page, the text separated into paragraphs, footnotes at the bottom, and page numbers above or below—is also a code that humans have learned so they know how to read a book. Readers don't necessarily read the chapter title at every page turn, despite it appearing at the top of every odd page, because they understand the code presented by a book's layout and typography.

When they see text such as, "Once upon a time there was a young princess called Snow White. She was very pretty," readers, according to Eco, first use a basic dictionary to detect the most basic properties of the words. For example, since Snow White is a princess, she's probably a woman. Woman activates ideas like human, having certain body parts, and so on. At this stage, readers are unaware of which of those properties must be actualized as they continue reading the text—that is, which of them are relevant to what the text is about. Would it be important to know that a human body can get severely ill if it ingests some sort of potion?

Then there are rules of co-reference. In the Snow White example, readers can decide that the she mentioned in the second sentence refers to the princess from the first one. Again, none of these instructions are explicit in the text; the connections are made by the readers.

The next set of codes are related to contextual and circumstantial selections. When people understand that the escalator from the initial example refers to the escalator from the current Tube station, then, as Eco says, they're making a circumstantial selection that connects the act of utterance with the extraverbal environment. The same sign hung in a bedroom has a completely different meaning.

With contextual selections, readers are expected to go from a basic dictionary understanding of each word to that of an encyclopedia. While the word princess might appear in many contexts, readers are expected to understand that in a children's story, a lot of information that pertains to princesses isn't relevant to the story, unless the author explicitly makes it so. A real-world princess might be part of a monarchy, with all its implications, while a fairy-tale princess is not. More importantly, an encyclopedia moves interpretation from that of the matching rules offered by a code like the dictionary to that of a "system of possible inferences," which introduce interpretative freedom.16

Thanks to the readers' own encyclopedic competence, they might know what a princess could be, in the whole sense of the concept, but that's not necessarily what the text needs it to be. Everything that the text doesn't mention is left as a possibility that could be actualized later or could well be left as is.

As mentioned earlier, a princess might activate the idea of a woman, and therefore a human. While neither the text nor the author might tell readers which properties of being a human or a princess are relevant for the rest of the story—whether it is having organs or dressing in a particular manner—because of encyclopedic knowledge, these properties remain latent from the moment you read them, and they might become relevant once the princess in the fairy tale is poisoned. Why would poison affect a fictional princess? Because fictional worlds are parasites of the real world; if alternate properties aren't spelled out, then you assume those of the real world (i.e., poison harms a princess).7

Now, you might ask: Why does the previous paragraph refer to a fairy tale? Nowhere in the Snow White example does the text explicitly talk about a children's story. This brings us to the next code: rhetorical and stylistic overcoding.

In this scenario, "once upon a time" is a figure of speech that tells readers to expect a fictional account of events that don't relate to the real world, and that this story is most likely targeted at kids, since that's a literary convention of fairy tales. Many of those types of expressions in daily life help people contextualize the remainder of the text. Think of when someone addresses a group of people at the start of a speech as "Ladies and gentlemen," regardless of the presence of ladies and gentlemen in the crowd, or if the speaker considers them such. The meaning of such expressions is taken from a code that interprets the figures of speech as a whole, instead of word by word.

Another form of literary convention is when a reader understands that the I in many stories isn't necessarily the empirical author of the book. When author Jorge Luis Borges starts the short story Funes the Memorious with

"My first recollection of Funes is quite clear, I see him at dusk, sometime in March or February of the year '84. That year, my father had taken me to spend the summer at Fray Bentos."3

readers know, or are expected to know, that the I doesn't refer to Borges, even though it isn't improbable for Borges to have traveled with his father to Fray Bentos, Uruguay, a city almost across the river from Borges's own Buenos Aires. (Borges was born in 1899, so he clearly cannot be the I from that story either. His short essay "Borges and I" illustrates the idea of the author as detached from the text and highlights how the author's voice is the first character the reader meets in any fictional book.) Also, due to literary conventions, the reader understands that the him from the text refers to Funes, since usually a story speaks about the character for which it is named.6 You can see how the readers are doing a lot of work for a text to function. Let's see the last code, intertextuality.

Literary critic Julia Kristeva said that "any text is constructed as a mosaic of quotations; any text is the absorption and transformation of another," introducing the notion of intertextuality into European semiotics.12 Eco said that by performing inferences by intertextual frames, readers bring in other texts to actualize the one they're reading. As an example, in Don Quixote, author Miguel de Cervantes expected his readers to know about chivalric romances of his time so they would understand the irony of the adventures of the unlikely hero, Alonso Quijano.

Sometimes authors aren't as explicit as Cervantes, but they latch onto archetypes, as in "rags to riches," "voyage and return," "the quest," and others as described by author Christopher Booker in his book The Seven Basic Plots;2 other times, they want you to have read every book, even those that don't exist, as is the case with Borges. A text is a dialogue between texts. (While literary theory usually concerns itself with books, exposure to different platforms fills in this intertextual knowledge, today more than ever—from social media with its memes, to streamed TV shows, to more classical media such as newspapers.)

Without being exhaustive, these are some of the codes that are put into action every time readers are confronted with the task of interpreting a text; it's no wonder, then, that for humans, text produced by an LLM seems to make so much sense. Besides having the knowledge of a basic dictionary, and understanding the codes of textual coherence, humans have access to a semantic encyclopedia to match the words of the produced text, plus a real world from which to borrow properties that haven't been spelled out in the synthetic text. Additionally, intertextual knowledge also kicks in and recognizes genre motifs, even letting readers predict how the text will develop.

 

Model Authors and Model Readers

One difference between human-written text and synthetic text generated by an LLM is that the first one is produced by an author with intentions. Whether it's a serious essay or ironic prose on an Internet forum, authors have intentions, and these intentions condition the text they produce. From the language chosen to express their message, to the type of encyclopedic knowledge they expect from their readers, an author makes a lot of decisions to ensure the semantic receiver of their message matches the statistical semantic characteristics of those receivers.15

In The Role of the Reader, Eco refers to this ideal reader as the "Model Reader," with the counterpart being the "Model Author." These terms don't refer to the empirical author or reader, mind you, but they're taken as textual strategies employed by both with the goal of having a successful interpretation of the text. Eco presents these two concepts as a way to describe the cooperation between empirical author and empirical reader. Because, as he puts it:

 

"A text is a lazy (or economic) mechanism that lives on the surplus value of meaning introduced by the recipient [...]"

 

To return to the initial example of a dog on an escalator, based on the Model Author image, a reader knows that a sign hung on the wall of the London Underground, with a specific color and typography, must have been placed there by a certain authority that will have the capacity to enforce what the sign says.

When interpreting text produced by an LLM, however, who is the Model Author? What semantic and encyclopedic competence does this author have? Are there intentions behind the LLM's generated text?

Some of these questions can be answered by looking at how the models have been trained. In the paper "Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus" by Jesse Dodge, et al., the authors discuss what type of text corpora companies use for training some of the LLMs that have been deployed to the public today via chat apps, search engines, and similar applications.4 They explain how, from a corpus of unfiltered English text, many passes are made to remove text, ranging from filtering out data that doesn't resemble English, to removing text that contains tokens from a banned word list. The authors explain that this type of filtering "disproportionately removes documents in dialects of English associated with minority identities." Additionally, filters remove documents that contain foul language. While, depending on the use case, some of these filters might be deemed appropriate, it's in the public interest to be aware of this type of filtering, because despite the current fascination with LLM-generated text, users need as many clues as possible to frame synthetic text in an adequate way.

Earlier, this article addressed encyclopedic competence. How does it differ from semantic competence and why does it matter for LLMs, its programmers, and its users? Patrizia Violi explains that:17

 

"[...] there are facts that, when ignored, denote a scarce or insufficient cultural knowledge but do not have any consequence upon our linguistic ability, and that there are facts, the ignorance of which, demonstrates a lack of linguistic competence."

 

Semantic competence allows people to become users of a language, while encyclopedic competence shows that those users belong to a particular culture. In countries across the Spanish-speaking world, this is quite common: There's just one dictionary of Spanish, but because of cultural differences, two speakers from different Latin American countries might understand each other very well on a lexical level, but they might not understand what certain words mean in a specific context.

The encyclopedia, then, is an intersubjective concept that helps define a culture. This intersubjective agreement regulates what things can possibly mean, but it's an agreement that must be verified from time to time, as with the case of Spanish speakers across different countries, so it cannot be taken for granted.17 Since the encyclopedia regulates meaning, then work like the one from Dodge, et al., becomes crucial, as it documents how LLMs build their encyclopedias.

 

What Game Are You Playing, LLM?

The Austrian philosopher Ludwig Wittgenstein introduced the idea of language games to describe the way people talk to each other. He posited that in the same way that there are rules for playing a game of chess, utterances can be defined according to the rules that specify how they should be used.13 So, explicit or not, every conversation seems to carry rules.

In his book The Postmodern Condition, Jean-François Lyotard explains that these rules, which aren't necessarily explicit or known by the players, can break communication if they are modified or ignored.13 "See you tomorrow" as said between friends after school doesn't have the same meaning as the same sentence said by the school principal to one of the students after a disciplinary speech, and it isn't the same as if said by one of the friends as they board a plane for a six-month exchange trip.

The first one is a phatic expression (used to maintain social relationships), the second one an order, and the last one a goodbye joke. Whether the friends see each other the next day doesn't matter, but the principal will be concerned if the student doesn't show up the next day in the office. If you look at these exchanges as language games, you can see how they set up certain expectations from each player.

So, what are the rules of a language game played with an LLM? What are the intentions of the synthetic text? Should the reader put all of their encyclopedic knowledge into play to help the synthetic text work?

Here, the challenge falls on applications that expose LLMs to the public and how they make the language game as clear as possible. With the lack of explicit rules, it seems that humans will end up making their own, with a tendency to humanize the interlocutor, as described by the ELIZA effect, named after the ELIZA chatbot created by Joseph Weizenbaum of MIT in 1966. Scholar Douglas Hofstadter defines it as "the susceptibility of people to read far more understanding than is warranted into strings of symbols—especially words—strung together by computers."11

 

Building the LLM Reader

In the paper "Model Cards for Model Reporting", Mitchell, et al., presented the idea of "Model Cards" as a way to adjunct information to machine learning models, indicating their training details, performance, and so on.14 In a similar fashion, applications that bring content produced by LLMs to the public should provide enough clues, by way of tags and other user interface features, for users to understand the provenance of the information presented to them.

If a text is a "syntactic-semantic-pragmatic" device whose foreseen interpretation is part of its generative process, as Eco says,6 then applications that present LLM generated text to the user should aid the pragmatic aspect of interpretation.

Think about a book cover and how it helps a reader contextualize the text. Usually, a book cover or its spine provides clues about the type of book: fiction, textbook, etc. The back cover, or jacket, helps identify the author and lets readers place the book in a given time period.

A newspaper has a certain shape and typography that clearly indicates it is a news publication. A similar identification should occur when readers are presented with text generated by an LLM, not only to be fair to humans, but also to facilitate the interpretation of the generated text to avoid, among other things, being taken as oracle-like responses to the world.

While we're still far away from having a set of industry-accepted guidelines for LLMs and applications that use them, a good starting point would be to expect these types of applications to disclose the text corpora that was used to train the model. Additionally, details of the process used for RHLF (reinforcement learning with human feedback) should be known, such as the diversity of the human group that provided feedback, or which languages they speak.

With the risk of tracing a parallel between LLMs and humans, it would be beneficial for users to understand which encyclopedia underlies the Model Author they project onto an LLM. Whenever you read, you project an ideal author with certain knowledge and a collection of pieces the author has written before, and so on; based on those expectations, you form a strategy that helps you understand the text and what the author might have meant. For example, it's impossible for a 14th-century Italian to know about the American continent, since Europeans learned about it much later, so a reader wouldn't expect Dante Alighieri to include it in his Divina Commedia. On the other hand, if a current Italian author claims there's nothing beyond the Atlantic Ocean, you might think it's a joke. While these examples may sound contrived, they make it clear that before any interpretation effort, it is very important to be aware of the freshness of the encyclopedia available to any author, let alone an LLM producing synthetic text.

There are many ways in which a text can help a reader build the necessary context for its interpretation. In the case of LLM generated text, providing citations for the generated response, together with information labeling the external systems consulted, aids the pragmatic response to the text. Having an LLM-generated response for a specific question is not the same as having an LLM parse a user's prompt as a question and then produce an answer by bringing a summary of the articles produced by a web search. In the first case, the answer is generated by the LLM—remember, an LLM generates the next most probable token.18 In the second case, it's a summary from human-produced sources. The presentation and labeling of the text should be clear enough for the user to tell which is which.

 

Conclusion

In this article we've shown how much a reader helps the lazy mechanism which is a text work to produce its meaning. On the other hand, an author should build a reader so both can meet at the text's interpretation. Now we are in the presence of a new medium disguised as good old text, but that text has been generated by an LLM, without authorial intention—an aspect that, if known beforehand, completely changes the expectations and response a human should have from a piece of text. Should our interpretation capabilities be engaged? If yes, under what conditions? The rules of the language game should be spelled out; they should not be passed over in silence.

 

Acknowledgments

I'm indebted to the following for their valuable feedback and input while producing this article: Silvana F., Daniel P., and Sergio S. Your discussions greatly helped me shape this text.

 

References

1. Bender, E. M., Gebru, T., McMillan-Major, A., Schmitchell, S. 2021. On the dangers of stochastic parrots: Can language models be too big? Proceedings of the ACM Conference on Fairness, Accountability, and Transparency, 610–623; https://dl.acm.org/doi/10.1145/3442188.3445922 .

2. Booker, C. 2004. The Seven Basic Plots. NY: Bloomsbury Continuum.

3. Borges, J. L., Hurley, A. 1962. Funes the Memorious. In Ficciones. Grove Press.

4. Dodge, J., et al. 2021. Documenting large webtext corpora: a case study on the colossal clean crawled corpus. In Proceedings of the Conference on Empirical Methods in Natural Language Processing; https://doi.org/10.18653/v1/2021.emnlp-main.98.

5. Eagleton, T. 2015. Literary Theory: An Introduction. Hoboken, N.J: Blackwell Publishing.

6. Eco, U. 1979. Introduction: The Role of the Reader. In The Role of the Reader. Bloomington, IN: Indiana University Press.

7. Eco, U. 1990. Small worlds. In The Limits of Interpretation, 74–75. Bloomington, IN: Indiana University Press.

8. Eco, U. 1990. Two levels of interpretation. In The Limits of Interpretation, 54–55. Bloomington, IN: Indiana University Press.

9. Eco, U. 2016. Lector in Fabula: La Cooperazione Interpretativa Nei Testi Narrativi. Milam, Italy: Bompiani, Editore.

10. Friederici, A. D. 2017. Language as a specific cognitive system. In Language in Our Brain: The Origins of a Uniquely Human Capacity, 3–4. Cambridge, MA: MIT Press.

11. Hofstadter, D. 1995. Fluid Concepts and Creative Analogies: Computer Models of the Fundamental Mechanisms of Thought. New York, NY: Basic Books.

12. Kristeva, J. 1980. Word, dialogue and novel. In Desire in Language, 66. New York, NY: Columbia University Press.

13. Lyotard, J.-F. 1979. The method: language games. In The Postmodern Condition: A Report on Knowledge. Paris, France: Les Éditions de Minuit.

14. Mitchell, M., et al. 2019. Model Cards for Model Reporting. In FAT* '19: Conference on Fairness, Accountability, and Transparency; https://doi.org/10.1145/3287560. 3287596

15. Shannon, C. E., Weaver, W. 1998. The interrelationship of the three levels of communication problems. In The Mathematical Theory of Communication. Urbana: University of Illinois Press.

16. Violi, P. 1998. Individual and communal encyclopedias. In Umberto Eco's Alternative: The Politics of Culture and the Ambiguities of Interpretation, 0–33. New York, NY: Peter Lang.

17. Violi, P. 2001. Encyclopedic competence and semantic competence. In Meaning and Experience, translated by Jeremy Carden, 159–164. Bloomington, IN: Indiana University Press.

18. Wolfram, S. 2023. What Is ChatGPT doing... and why does it work? Stephen Wolfram Writings; writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work .

 

Alvaro Videla is a developer advocate at Microsoft. He is the coauthor of RabbitMQ in Action and has written for ACM. He is on Twitter as @old_sound.

Copyright © 2023 held by owner/author. Publication rights licensed to ACM.

acmqueue

Originally published in Queue vol. 21, no. 3
Comment on this article in the ACM Digital Library





More related articles:

Divyansh Kaushik, Zachary C. Lipton, Alex John London - Resolving the Human-subjects Status of Machine Learning's Crowdworkers
In recent years, machine learning (ML) has relied heavily on crowdworkers both for building datasets and for addressing research questions requiring human interaction or judgment. The diversity of both the tasks performed and the uses of the resulting data render it difficult to determine when crowdworkers are best thought of as workers versus human subjects. These difficulties are compounded by conflicting policies, with some institutions and researchers regarding all ML crowdworkers as human subjects and others holding that they rarely constitute human subjects. Notably few ML papers involving crowdwork mention IRB oversight, raising the prospect of non-compliance with ethical and regulatory requirements.


Harsh Deokuliar, Raghvinder S. Sangwan, Youakim Badr, Satish M. Srinivasan - Improving Testing of Deep-learning Systems
We used differential testing to generate test data to improve diversity of data points in the test dataset and then used mutation testing to check the quality of the test data in terms of diversity. Combining differential and mutation testing in this fashion improves mutation score, a test data quality metric, indicating overall improvement in testing effectiveness and quality of the test data when testing deep learning systems.


Edlyn V. Levine - Cargo Cult AI
Evidence abounds that the human brain does not innately think scientifically; however, it can be taught to do so. The same species that forms cargo cults around widespread and unfounded beliefs in UFOs, ESP, and anything read on social media also produces scientific luminaries such as Sagan and Feynman. Today's cutting-edge LLMs are also not innately scientific. But unlike the human brain, there is good reason to believe they never will be unless new algorithmic paradigms are developed.


Zachary Tellman - Designing a Framework for Conversational Interfaces
Wherever possible, business logic should be described by code rather than training data. This keeps our system's behavior principled, predictable, and easy to change. Our approach to conversational interfaces allows them to be built much like any other application, using familiar tools, conventions, and processes, while still taking advantage of cutting-edge machine-learning techniques.





© ACM, Inc. All Rights Reserved.