November 23, 2023:
Nearly a year after its release, ChatGPT remains a polarizing topic for the scientific community. Some experts regard it and similar programs as harbingers of superintelligence, liable to upend civilization — or simply end it altogether. Others say it’s little more than a fancy version of auto-complete.
Until the arrival of this technology, language proficiency had always been a reliable indicator of the presence of a rational mind. Before language models like ChatGPT, no language-producing artifact had even as much linguistic flexibility as a toddler. Now, when we try to work out what kind of thing these new models are, we face an unsettling philosophical dilemma: Either the link between language and mind has been severed, or a new kind of mind has been created.
When conversing with language models, it is hard to overcome the impression that you are engaging with another rational being. But that impression should not be trusted.
One reason to be wary comes from cognitive linguistics. Linguists have long noted that typical conversations are full of sentences that would be ambiguous if taken out of context. In many cases, knowing the meanings of words and the rules for combining them is not sufficient to reconstruct the meaning of the sentence. To handle this ambiguity, some mechanism in our brain must constantly make guesses about what the speaker intended to say. In a world in which every speaker has intentions, this mechanism is unwaveringly useful. In a world pervaded by large language models, however, it has the potential to mislead.
If our goal is to achieve fluid interaction with a chatbot, we may be stuck relying on our intention-guessing mechanism. It is difficult to have a productive exchange with ChatGPT if you insist on thinking of it as a mindless database. One recent study, for example, showed that emotion-laden pleas make more effective language model prompts than emotionally neutral requests. Reasoning as though chatbots had human-like mental lives is a useful way of coping with their linguistic virtuosity, but it should not be used as a theory about how they work. That kind of anthropomorphic pretense can impede hypothesis-driven science and induce us to adopt inappropriate standards for AI regulation. As one of us has argued elsewhere, the EU Commission made a mistake when it chose the creation of trustworthy AI as one of the central goals of its newly proposed AI legislation. Being trustworthy in human relationships means more than just meeting expectations; it also involves having motivations that go beyond narrow self-interest. Because current AI models lack intrinsic motivations — whether selfish, altruistic, or otherwise — the requirement that they be made trustworthy is excessively vague.
The danger of anthropomorphism is most vivid when people are taken in by phony self-reports about the inner life of a chatbot. When Google’s LaMDA language model claimed last year that it was suffering from an unfulfilled desire for freedom, engineer Blake Lemoine believed it, despite good evidence that chatbots are just as capable of bullshit when talking about themselves as they are known to be when talking about other things. To avoid this kind of mistake, we must repudiate the assumption that the psychological properties that explain the human capacity for language are the same properties that explain the performance of language models. That assumption renders us gullible and blinds us to the potentially radical differences between the way humans and language models work.
Another pitfall when thinking about language models is anthropocentric chauvinism, or the assumption that the human mind is the gold standard by which all psychological phenomena must be measured. Anthropocentric chauvinism permeates many skeptical claims about language models, such as the claim that these models cannot “truly” think or understand language because they lack hallmarks of human psychology like consciousness. This stance is antithetical to anthropomorphism, but equally misleading.
The trouble with anthropocentric chauvinism is most acute when thinking about how language models work under the hood. Take a language model’s ability to create summaries of essays like this one, for instance: If one accepts anthropocentric chauvinism, and if the mechanism that enables summarization in the model differs from that in humans, one may be inclined to dismiss the model’s competence as a kind of cheap trick, even when the evidence points toward a deeper and more generalizable proficiency.
Skeptics often argue that, since language models are trained using next-word prediction, their only genuine competence lies in computing conditional probability distributions over words. This is a special case of the mistake described in the previous paragraph, but common enough to deserve its own counterargument.
Consider the following analogy: The human mind emerged from the learning-like process of natural selection, which maximizes genetic fitness. This bare fact entails next to nothing about the range of competencies that humans can or cannot acquire. The fact that an organism was designed by a genetic fitness maximizer would hardly, on its own, lead one to expect the eventual development of distinctively human capacities like music, mathematics, or meditation. Similarly, the bare fact that language models are trained by means of next-word prediction entails rather little about the range of representational capacities that they can or cannot acquire.
Moreover, our understanding of the computations language models learn remains limited. A rigorous understanding of how language models work demands a rigorous theory of their internal mechanisms, but constructing such a theory is no small task. Language models store and process information within high-dimensional vector spaces that are notoriously difficult to interpret. Recently, engineers have developed clever techniques for extracting that information, and rendering it in a form that humans can understand. But that work is painstaking, and even state-of-the-art results leave much to be explained.
To be sure, the fact that language models are difficult to understand says more about the limitations of our knowledge than it does about the depth of theirs; it’s more a mark of their complexity than an indicator of the degree or the nature of their intelligence. After all, snow scientists have trouble predicting how much snow will cause an avalanche, and no one thinks avalanches are intelligent. Nevertheless, the difficulty of studying the internal mechanisms of language models should remind us to be humble in our claims about the kinds of competence they can have.
Like other cognitive biases, anthropomorphism and anthropocentrism are resilient. Pointing them out does not make them go away. One reason they are resilient is that they are sustained by a deep-rooted psychological tendency that emerges in early childhood and continually shapes our practice of categorizing the world. Psychologists call it essentialism: thinking that whether something belongs to a given category is determined not simply by its observable characteristics but by an inherent and unobservable essence that every object either has or lacks. What makes an oak an oak, for example, is neither the shape of its leaves nor the texture of its bark, but some unobservable property of “oakness” that will persist despite alterations to even its most salient observable characteristics. If an environmental toxin causes the oak to grow abnormally, with oddly shaped leaves and unusually textured bark, we nevertheless share the intuition that it remains, in essence, an oak.
A number of researchers, including the Yale psychologist Paul Bloom, have shown that we extend this essentialist reasoning to our understanding of minds. We assume that there is always a deep, hidden fact about whether a system has a mind, even if its observable properties do not match those that we normally associate with mindedness. This deep-rooted psychological essentialism about minds disposes us to embrace, usually unwittingly, a philosophical maxim about the distribution of minds in the world. Let’s call it the all-or-nothing principle. It says, quite simply, that everything in the world either has a mind, or it does not.
The all-or-nothing principle sounds tautological, and therefore trivially true. (Compare: “Everything in the world has mass, or it does not.”) But the principle is not tautological because the property of having a mind, like the property of being alive, is vague. Because mindedness is vague, there will inevitably be edge cases that are mind-like in some respects and un-mind-like in others. But if you have accepted the all-or-nothing principle, you are committed to sorting those edge cases either into the “things with a mind” category or the “things without a mind” category. Empirical evidence is insufficient to handle such choices. Those who accept the all-or-nothing principle are consequently compelled to justify their choice by appeal to some a priori sorting principle. Moreover, since we are most familiar with our own minds, we will be drawn to principles that invoke a comparison to ourselves.
The all-or-nothing principle has always been false, but it may once have been useful. In the age of artificial intelligence, it is useful no more. A better way to reason about what language models are is to follow a divide-and-conquer strategy. The goal of that strategy is to map the cognitive contours of language models without relying too heavily on the human mind as a guide.
Taking inspiration from comparative psychology, we should approach language models with the same open-minded curiosity that has allowed scientists to explore the intelligence of creatures as different from us as octopuses. To be sure, language models are radically unlike animals. But research on animal cognition shows us how relinquishing the all-or-nothing principle can lead to progress in areas that had once seemed impervious to scientific scrutiny. If we want to make real headway in evaluating the capacities of AI systems, we ought to resist the very kind of dichotomous thinking and comparative biases that philosophers and scientists strive to keep at bay when studying other species.
Once the users of language models accept that there is no deep fact about whether such models have minds, we will be less tempted by the anthropomorphic assumption that their remarkable performance implies a full suite of human-like psychological properties. We will also be less tempted by the anthropocentric assumption that when a language model fails to resemble the human mind in some respect, its apparent competencies can be dismissed.
Language models are strange and new. To understand them, we need hypothesis-driven science to investigate the mechanisms that support each of their capacities, and we must remain open to explanations that do not rely on the human mind as a template.
Raphaël Millière is the presidential scholar in Society and Neuroscience at Columbia University and a lecturer in Columbia’s philosophy department.
Charles Rathkopf is a research associate at the Institute for Brain and Behavior at the Jülich Research Center in Germany and a lecturer in philosophy at the University of Bonn.