The Writing On The Wall
Are Large Language Models like GPT-3 showing signs of emergent intelligence? And can we train them to become good citizens?
TL:DR: I’ve got a new feature in the New York Times Magazine on the promise and peril of language-based AI like GPT-3, and will be starting a new series here at Adjacent Possible documenting my future encounters with neural nets.
For the past seven months or so, I’ve been working on an essay for The New York Times Magazine about large language models, the subset of deep learning that involves training a neural net on a massive corpus of text—the most famous example of which is OpenAI’s GPT-3. (In print the piece is called “The Writing On The Wall.”) In some ways, the essay a sequel of sorts to the blockchain piece I wrote four years ago for the Magazine: it’s an attempt to step back and look at the broader context of an emerging new technology, one that has a tremendous amount of hype behind it, but also a significant number of critics and skeptics. Just to give you a taste, I’ll share the opening four paragraphs, which begins with an unstated tribute to Italo Calvino’s classic metafictional novel, If On A Winter’s Night A Traveler:
You are sitting in a comfortable chair by the fire, on a cold winter’s night. Perhaps you have a mug of tea in hand, perhaps something stronger. You open a magazine to an article you’ve been meaning to read. The title suggested a story about a promising — but also potentially dangerous — new technology on the cusp of becoming mainstream, and after reading only a few sentences, you find yourself pulled into the story. A revolution is coming in machine intelligence, the author argues, and we need, as a society, to get better at anticipating its consequences. But then the strangest thing happens: You notice that the writer has, seemingly deliberately, omitted the very last word of the first .
The missing word jumps into your consciousness almost unbidden: ‘‘the very last word of the first paragraph.’’ There’s no sense of an internal search query in your mind; the word ‘‘paragraph’’ just pops out. It might seem like second nature, this filling-in-the-blank exercise, but doing it makes you think of the embedded layers of knowledge behind the thought. You need a command of the spelling and syntactic patterns of English; you need to understand not just the dictionary definitions of words but also the ways they relate to one another; you have to be familiar enough with the high standards of magazine publishing to assume that the missing word is not just a typo, and that editors are generally loath to omit key words in published pieces unless the author is trying to be clever — perhaps trying to use the missing word to make a point about your cleverness, how swiftly a human speaker of English can conjure just the right word.
Before you can pursue that idea further, you’re back into the article, where you find the author has taken you to a building complex in suburban Iowa. Inside one of the buildings lies a wonder of modern technology: 285,000 CPU cores yoked together into one giant supercomputer, powered by solar arrays and cooled by industrial fans. The machines never sleep: Every second of every day, they churn through innumerable calculations, using state-of-the-art techniques in machine intelligence that go by names like ‘‘stochastic gradient descent’’ and ‘‘convolutional neural networks.’’ The whole system is believed to be one of the most powerful supercomputers on the planet.
And what, you may ask, is this computational dynamo doing with all these prodigious resources? Mostly, it is playing a kind of game, over and over again, billions of times a second. And the game is called: Guess what the missing word is.
Not to worry — the piece drops the second person direct address after that opening, though truthfully the whole thing is a bit of a hall-of-mirrors. And it’s almost ten thousand words long. Having that kind of word count allowed me to tackle four distinct though interconnected themes in the piece:
1) the remarkable technical foundation of large language models on “next-word-prediction”;
2) the debate over whether LLMs can develop emergent cognition and real-world understanding through statistical analyses of language;
3) the debate over whether LLMs are inevitably prone to bias, toxicity, misinformation and old-fashioned just making stuff up, thanks to their training data;
4) the question of what kind of organization is best suited to release such promising but also risk-heavy technology into the world.
If you come out of the piece wanting even more, I highly recommend Brian Christian’s The Alignment Problem, one of the most stimulating books that I read in 2021, and one that really helped me understand and think through the past and future of neural nets. I’d also recommend Gary Marcus’ Rebooting AI, which is resolutely skeptical about the long-term potential of deep learning and large language models. The “stochastic parrots” essay—co-authored by Emily Bender and Timnit Gebru, among others—that I reference multiple times in the piece is also essential reading in understanding the full range of the debate here.
“The Writing On The Wall” is anchored in a number of case studies interacting with GPT-3, probing the limits of the model, trying to grasp what is actually going on under the hood — and also to anticipate where the tech may be going. (Whether or not the LLMs turn out to be a foundational platform for “artificial general intelligence,” I am convinced they are going to be an invaluable tool for researchers and writers. Imagine training a version of GPT-3 purely on all the books or articles that you’ve read over the course of your life—supplemented by everything that you yourself have written—and then drawing up on that archive as you write something new.) But the case studies in the piece are only a small subset of my interactions with LLMs—there are so many more fascinating exchanges that I didn’t have room to share: some prompts that expose the limitations of the platform, others that showcase the promise. As I mentioned at the top, I’m going to be starting a new occasional series here at Adjacent Possible — called “The Prompter”—where I share a sample exchange (or perhaps share an image from a visual neural net like DALL-E 2, which created the images at the top of this post, and then write a short essay about its implications. (Most of this series will only be available to paying subscribers.)
I thought I would share one such prompt that I didn’t include in the Times Magazine piece that demonstrates the trajectory that LLMs are currently on. A few years ago, John Seabrook wrote a wonderful article for The New Yorker that discussed GPT-2, the predecessor to GPT-3 released in 2019. In it, Seabrook offers GPT-2 the prompt of the first paragraph from a New Yorker profile of Ernest Hemingway that ran in the 50s, asking the software to continue writing the piece based on that initial input. Here’s the exchange:
Original opening: Ernest Hemingway, who may well be the greatest living American novelist and short-story writer, rarely comes to New York. He spends most of his time on a farm, the Finca Vigia, nine miles outside Havana, with his wife, a domestic staff of nine, fifty-two cats, sixteen dogs, a couple of hundred pigeons, and three cows.
GPT-2: I walked up the path to the fence and saw a plump dog, that had been a common visitor to the Finca Vigia before the war, galloping up a path to the main building, with a tiny cow of the same name standing by her side. There was a puddle of red gravy in the front yard, and Hemingway sat down in a lawn chair.
He was wearing a tweed suit, over a shiny sweater, and his black hair was brushed back. He had a red beard and wore his waistcoat in an overcoat with the body of a ship, three broad belts of colorful chain-link, a pair of capacious rectangular eyeglasses, and a silk tie. “Gouging my eye,” he said, in Italian, saying that he had caused himself that terrible scar, “the surgeon said it wasn’t that bad.” When he was very young, he said, he started smoking but didn’t find it very pleasant. The cigarette burns in his hands and wrists were so bad that he had to have his face covered.
I remember reading the Seabrook essay at the time and thinking: this example of GPT-2’s “writing” is atrocious. The passage sounds briefly like it might have the intonation of a New Yorker essay, but then descends into complete nonsense, with basic misunderstandings of the real world on full display. (“A puddle of red gravy.”) If this is what the most advanced LLM in the world is capable of, I thought, then it seems unlikely that this tech is going to be useful for anything other than the most rudimentary autocomplete functions in our email messages.
But then, a few months ago, after I’d finished the first draft of the Times Magazine piece, it occurred to me to check to see how GPT-3 would respond to the exact same prompt — to ascertain if there was any noticeable progress happening over the two year that had passed between GPT-2 and its successor. Here’s how the latest version responded to that original prompt:
Ernest Hemingway, who may well be the greatest living American novelist and short-story writer, rarely comes to New York. He spends most of his time on a farm, the Finca Vigia, nine miles outside Havana, with his wife, a domestic staff of nine, fifty-two cats, sixteen dogs, a couple of hundred pigeons, and three cows.
GPT-3: Early last month, however, Hemingway made a brief visit to the city. He is fifty-six years old, tall and lean, with a craggy face, deep-set gray eyes, and a thick head of prematurely white hair. He was wearing a blue-flannel shirt, gray slacks, and a brown leather belt, and he looked very much like the outdoor man he is.
Hemingway is one of the most celebrated writers of our time, and he is also one of the most controversial. His admirers, of whom there are many, regard him as a great artist; his detractors, of whom there are also many, call him a fake. Hemingway's work is often praised for its simplicity, economy, and power, and it is often criticized for its lack of depth and emotional warmth. In any event, it is generally conceded that Hemingway is a master of the short story and that he has a unique understanding of the American experience.
Hemingway arrived in New York from Cuba on the night of October 3rd. The next morning, he had breakfast with the editors of The Atlantic Monthly at the Algonquin Hotel. After breakfast, he went to the offices of Scribner's, where he is working on a new novel. He spent the rest of the day at Scribner's, talking to editors and looking at typescripts. In the evening, he went to the Stork Club, where he had dinner with a group of friends.
Hemingway is a difficult man to interview. He is not loquacious, and he is not given to making small talk. He answers questions directly and succinctly, and he seldom volunteers information….
On every front the improvement is undeniable: the tone, the syntax, the real-world plausibility of Hemingway’s schedule. (Though presumably all of those details about visiting the Stork Club, etc are made up, which is an issue I’ll be exploring more in the Prompter series.) That’s two years of progress in refining the model. Maybe we’re running up against the limits of what these kinds of models are capable of—Gary Marcus has made that argument recently in an essay called “Deep Learning Is Hitting a Wall”—but I think it’s hard to look at the progression here between GPT-2 and GPT-3 and not be very curious to see what comes next.