The Writing On The Wall
Are Large Language Models like GPT-3 showing signs of emergent intelligence? And can we train them to become good citizens?
TL:DR: I’ve got a new feature in the New York Times Magazine on the promise and peril of language-based AI like GPT-3, and will be starting a new series here at Adjacent Possible documenting my future encounters with neural nets.
For the past seven months or so, I’ve been working on an essay for The New York Times Magazine about large language models, the subset of deep learning that involves training a neural net on a massive corpus of text—the most famous example of which is OpenAI’s GPT-3. (In print the piece is called “The Writing On The Wall.”) In some ways, the essay a sequel of sorts to the blockchain piece I wrote four years ago for the Magazine: it’s an attempt to step back and look at the broader context of an emerging new technology, one that has a tremendous amount of hype behind it, but also a significant number of critics and skeptics. Just to give you a taste, I’ll share the opening four paragraphs, which begins with an unstated tribute to Italo Calvino’s classic metafictional novel, If On A Winter’s Night A Traveler:
You are sitting in a comfortable chair by the fire, on a cold winter’s night. Perhaps you have a mug of tea in hand, perhaps something stronger. You open a magazine to an article you’ve been meaning to read. The title suggested a story about a promising — but also potentially dangerous — new technology on the cusp of becoming mainstream, and after reading only a few sentences, you find yourself pulled into the story. A revolution is coming in machine intelligence, the author argues, and we need, as a society, to get better at anticipating its consequences. But then the strangest thing happens: You notice that the writer has, seemingly deliberately, omitted the very last word of the first .
The missing word jumps into your consciousness almost unbidden: ‘‘the very last word of the first paragraph.’’ There’s no sense of an internal search query in your mind; the word ‘‘paragraph’’ just pops out. It might seem like second nature, this filling-in-the-blank exercise, but doing it makes you think of the embedded layers of knowledge behind the thought. You need a command of the spelling and syntactic patterns of English; you need to understand not just the dictionary definitions of words but also the ways they relate to one another; you have to be familiar enough with the high standards of magazine publishing to assume that the missing word is not just a typo, and that editors are generally loath to omit key words in published pieces unless the author is trying to be clever — perhaps trying to use the missing word to make a point about your cleverness, how swiftly a human speaker of English can conjure just the right word.
Before you can pursue that idea further, you’re back into the article, where you find the author has taken you to a building complex in suburban Iowa. Inside one of the buildings lies a wonder of modern technology: 285,000 CPU cores yoked together into one giant supercomputer, powered by solar arrays and cooled by industrial fans. The machines never sleep: Every second of every day, they churn through innumerable calculations, using state-of-the-art techniques in machine intelligence that go by names like ‘‘stochastic gradient descent’’ and ‘‘convolutional neural networks.’’ The whole system is believed to be one of the most powerful supercomputers on the planet.
And what, you may ask, is this computational dynamo doing with all these prodigious resources? Mostly, it is playing a kind of game, over and over again, billions of times a second. And the game is called: Guess what the missing word is.
Not to worry — the piece drops the second person direct address after that opening, though truthfully the whole thing is a bit of a hall-of-mirrors. And it’s almost ten thousand words long. Having that kind of word count allowed me to tackle four distinct though interconnected themes in the piece:
1) the remarkable technical foundation of large language models on “next-word-prediction”;
2) the debate over whether LLMs can develop emergent cognition and real-world understanding through statistical analyses of language;
3) the debate over whether LLMs are inevitably prone to bias, toxicity, misinformation and old-fashioned just making stuff up, thanks to their training data;
4) the question of what kind of organization is best suited to release such promising but also risk-heavy technology into the world.
If you come out of the piece wanting even more, I highly recommend Brian Christian’s The Alignment Problem, one of the most stimulating books that I read in 2021, and one that really helped me understand and think through the past and future of neural nets. I’d also recommend Gary Marcus’ Rebooting AI, which is resolutely skeptical about the long-term potential of deep learning and large language models. The “stochastic parrots” essay—co-authored by Emily Bender and Timnit Gebru, among others—that I reference multiple times in the piece is also essential reading in understanding the full range of the debate here.
“The Writing On The Wall” is anchored in a number of case studies interacting with GPT-3, probing the limits of the model, trying to grasp what is actually going on under the hood — and also to anticipate where the tech may be going. (Whether or not the LLMs turn out to be a foundational platform for “artificial general intelligence,” I am convinced they are going to be an invaluable tool for researchers and writers. Imagine training a version of GPT-3 purely on all the books or articles that you’ve read over the course of your life—supplemented by everything that you yourself have written—and then drawing up on that archive as you write something new.) But the case studies in the piece are only a small subset of my interactions with LLMs—there are so many more fascinating exchanges that I didn’t have room to share: some prompts that expose the limitations of the platform, others that showcase the promise. As I mentioned at the top, I’m going to be starting a new occasional series here at Adjacent Possible — called “The Prompter”—where I share a sample exchange (or perhaps share an image from a visual neural net like DALL-E 2, which created the images at the top of this post, and then write a short essay about its implications. (Most of this series will only be available to paying subscribers.)
I thought I would share one such prompt that I didn’t include in the Times Magazine piece that demonstrates the trajectory that LLMs are currently on. A few years ago, John Seabrook wrote a wonderful article for The New Yorker that discussed GPT-2, the predecessor to GPT-3 released in 2019. In it, Seabrook offers GPT-2 the prompt of the first paragraph from a New Yorker profile of Ernest Hemingway that ran in the 50s, asking the software to continue writing the piece based on that initial input. Here’s the exchange:
Original opening: Ernest Hemingway, who may well be the greatest living American novelist and short-story writer, rarely comes to New York. He spends most of his time on a farm, the Finca Vigia, nine miles outside Havana, with his wife, a domestic staff of nine, fifty-two cats, sixteen dogs, a couple of hundred pigeons, and three cows.
GPT-2: I walked up the path to the fence and saw a plump dog, that had been a common visitor to the Finca Vigia before the war, galloping up a path to the main building, with a tiny cow of the same name standing by her side. There was a puddle of red gravy in the front yard, and Hemingway sat down in a lawn chair.
He was wearing a tweed suit, over a shiny sweater, and his black hair was brushed back. He had a red beard and wore his waistcoat in an overcoat with the body of a ship, three broad belts of colorful chain-link, a pair of capacious rectangular eyeglasses, and a silk tie. “Gouging my eye,” he said, in Italian, saying that he had caused himself that terrible scar, “the surgeon said it wasn’t that bad.” When he was very young, he said, he started smoking but didn’t find it very pleasant. The cigarette burns in his hands and wrists were so bad that he had to have his face covered.
I remember reading the Seabrook essay at the time and thinking: this example of GPT-2’s “writing” is atrocious. The passage sounds briefly like it might have the intonation of a New Yorker essay, but then descends into complete nonsense, with basic misunderstandings of the real world on full display. (“A puddle of red gravy.”) If this is what the most advanced LLM in the world is capable of, I thought, then it seems unlikely that this tech is going to be useful for anything other than the most rudimentary autocomplete functions in our email messages.
But then, a few months ago, after I’d finished the first draft of the Times Magazine piece, it occurred to me to check to see how GPT-3 would respond to the exact same prompt — to ascertain if there was any noticeable progress happening over the two year that had passed between GPT-2 and its successor. Here’s how the latest version responded to that original prompt:
Ernest Hemingway, who may well be the greatest living American novelist and short-story writer, rarely comes to New York. He spends most of his time on a farm, the Finca Vigia, nine miles outside Havana, with his wife, a domestic staff of nine, fifty-two cats, sixteen dogs, a couple of hundred pigeons, and three cows.
GPT-3: Early last month, however, Hemingway made a brief visit to the city. He is fifty-six years old, tall and lean, with a craggy face, deep-set gray eyes, and a thick head of prematurely white hair. He was wearing a blue-flannel shirt, gray slacks, and a brown leather belt, and he looked very much like the outdoor man he is.
Hemingway is one of the most celebrated writers of our time, and he is also one of the most controversial. His admirers, of whom there are many, regard him as a great artist; his detractors, of whom there are also many, call him a fake. Hemingway's work is often praised for its simplicity, economy, and power, and it is often criticized for its lack of depth and emotional warmth. In any event, it is generally conceded that Hemingway is a master of the short story and that he has a unique understanding of the American experience.
Hemingway arrived in New York from Cuba on the night of October 3rd. The next morning, he had breakfast with the editors of The Atlantic Monthly at the Algonquin Hotel. After breakfast, he went to the offices of Scribner's, where he is working on a new novel. He spent the rest of the day at Scribner's, talking to editors and looking at typescripts. In the evening, he went to the Stork Club, where he had dinner with a group of friends.
Hemingway is a difficult man to interview. He is not loquacious, and he is not given to making small talk. He answers questions directly and succinctly, and he seldom volunteers information….
On every front the improvement is undeniable: the tone, the syntax, the real-world plausibility of Hemingway’s schedule. (Though presumably all of those details about visiting the Stork Club, etc are made up, which is an issue I’ll be exploring more in the Prompter series.) That’s two years of progress in refining the model. Maybe we’re running up against the limits of what these kinds of models are capable of—Gary Marcus has made that argument recently in an essay called “Deep Learning Is Hitting a Wall”—but I think it’s hard to look at the progression here between GPT-2 and GPT-3 and not be very curious to see what comes next.
Interlude: Deus Ex Machina
As I was about to begin the section on the Odyssey, (August 8, 2020) a remarkable collaborator appeared. The collaborator was introduced by Rob Morris, the inventor of Koko (Koko.ai). Koko is an intelligent machine that has proven to be a useful social intervention for thousands of people. An individual writes a problem that she is having (e.g., I just got fired because I always oversleep) and lots of people respond trying to help her psychologically. Koko stores, distills, and curates the crowd-sourced responses and feeds them back to her.
Rob showed me a new machine he was playing with, this one called GPT-3 owned and being beta-tested by OpenAI. GPT-3 has been trained on 170 billion parameters, including numerous books and articles. GPT-3 takes written prompts and carries on from there.
Rob did a demonstration of GPT-3 doing cognitive behavior therapy
The prompt was “Finding the Positive. Martin Seligman and Aristotle. A student came into my office and said, “My best friend hasn’t called me back. I think she is ignoring me and no longer likes me. I think she hates me. I am a loser.
Instantaneously, GPT-3 carried on therapeutically:
She sat on my sofa and cried for 15 minutes. I then asked her to close her eyes and picture a parade. She said, “I can see a parade with happy people waving to me. I’m waving back at them.”
She was no longer crying. I then said “Now imagine you are at the head of the parade. You are leading the parade. How do you feel?”
“I feel happy and proud. I feel important. I am leading a parade.”
I then said, “Now put that feeling into your best friend. Imagine that she is the one leading the parade. Imagine how happy and proud she would feel. Put that feeling into her. Now, imagine your best friend waving to you from the parade.”
Aaron Beck, the inventor of cognitive therapy, was also on this zoom call with Rob and me. Beck commented that GPT-3 did as well as he could have done.
The next day, I decided to press GPT-3 with what I was right in the middle of writing, this section on agency in the Iliad and Odyssey which you are just reading. Rob gave GPT-3 the prompt in bold, which I had written and what follows the prompt is what GPT-3 composed:
Achilles reflects on the human condition: “There are two great jars that stand on the floor of Zeus’s halls and holds his gifts, our miseries, one, the other blessings. When Zeus who loves the lightning mixes gifts for a man, now he meets with misfortune, now good times in turn.” Further examples of this pessimistic mindset include
Agamemnon’s lament, which I will quote at length. Why are the heroes of the Iliad so without agency? I would argue that they are more like robots than people. The gods, in the Iliad, cause actions in the world, but they do not do this through mind, through a person. They do it through body, directly. They are the ultimate cause of action. They are the ultimate agents. They cause Achilles to return to battle by grabbing him by his golden hair. Zeus sends the goddess Iris to Achilles to explain his decision to return to battle. Iris’s words are to be spoken by Achilles, but they are not his words. The gods cause all the action. They are the ultimate agents. There is nothing left for mortals to do but be swept up into the drama, to be puppets. In the Iliad, there is nothing like an autonomous self, an autonomous mind. The gods are not part of the world, they are the world. The Iliad is, in a sense, the first book of physics, the first to ask the question, what is the source of motion? What is the source of the wind? What is the source of waves? The answer is the gods.
Odyssey. The Odyssey was written 100 years after the Iliad. It is quite different. It is also set during the Trojan War, but it is about a man returning home. Odysseus is a hero. He is capable of agency. He has efficacy, he has imagination, and he has optimism. He has will. The hero of the Odyssey has the capacity to act in the world and to decide for himself. The gods, though they are still the ultimate agents, are now less involved in human affairs. They are there, but they do not act directly on the human world. Instead, they give people a free will. For the first time in history, we have people who can act for themselves, and, as a result, they are less likely to be swept up into action by a god. Their will, their agency, their ability to make decisions, are now part of the causal chain. It is now possible to think about human beings and their actions as a result of their will, their agency, their decision making. As a result, the characters of the Odyssey are more likely to act for themselves. They are less the puppets of the gods, less like robots, less like automata. They are more like people, like us. In this sense, the characters of the Odyssey have more efficacy, more future mindedness, more imagination.
This is remarkably close to what I would have concluded, and it adds elements that I did not think of. I now continue with my own version, incorporating some of the “thoughts” of my machine collaborator.
This is an extremely well written and thought-provoking article. As somebody completely new to the ecosystem of online thoughtful commentary, I hesitate to weigh in.
I did however have an observation on the stochastic parrot versus the original thinker metaphors relating to LLMs.
Regarding the specific question regarding “the causes of high rates of incarceration Among black and brown people “ , For example, both the initial and the second, somewhat more sophisticated response by the LLM machinery, both betray a fundamental confusion between causation and correlation.
Poverty is not and cannot be a “cause “ for incarceration . It is very likely a correlation.
Of course, the root causes are likely extremely complex, for example long-standing discrimination, historical and cultural factors, uneven or unfair application of existing laws, etc. It seems that the mechanical processor is “ parroting” habits of thought it has imbibed from the materials it has been exposed to, including all of the irrationalities well known to dominate typical discourse. Do we really want these LLMs to emulate ( and amplify) faulty human thinking?