The creation of Disney's masterpiece, Snow White, gives us a preview of what may be coming with AI algorithms sophisticated enough to pass for sentient beings
I suspect many of you saw the coverage and commentary last week about the Google researcher who believes one of the company’s large language models, LaMDA, has become sentient. If you read my NY Times Magazine piece on AI from April or have been reading some of my recent posts here, you can guess the broad strokes of my response to the story: large language models are clearly not sentient, but they are doing something more interesting than just statistical parlor tricks. But one thing that many observers have noted—AI critics and enthusiasts alike—is that the conversational models like LaMDA are getting sophisticated enough to fool people into thinking that there’s some kind of mind at the other end of the prompt. They’re not sentient, and probably not even intelligent in the way we traditionally define human intelligence, but they’re getting remarkably good at passing for a sentient human, conjuring up the illusion of intelligence.
As it turns out, I’d written about this idea before, at the very end of one of my favorite chapters from my 2016 book Wonderland, a chapter about the history of illusion, which began with the bizarre history of the illusion palaces of the late 18th century (like the Panorama and the Phantasmagoria) along with other proto-VR devices like the stereoscope, invented by the Scottish scientist David Brewster. (Brewster had a great name for all these forms of sensory distortion: “natural magic.”) The chapter ended with a long set piece about Disney and the flurry of innovation that went into the creation of Snow White in the mid 1930s, particularly Disney’s extraordinary achievement of getting people to be emotionally moved by hand-drawn images projected onto a screen—and then took that as a jumping off point for thinking about the illusions that would soon become possible with simulated intelligence.
You can make the argument that the single most dramatic acceleration point in the history of illusion occurred between the years of 1928 and 1937, the years between the release of Steamboat Willie, Disney’s breakthrough sound cartoon introducing Mickey Mouse, and the completion of his masterpiece, Snow White, the first long-form animated film in history. It is hard to think of another stretch where the formal possibilities of an artistic medium expanded in such a dramatic fashion, in such a short amount of time.
Steamboat Willie is rightly celebrated for what it brought to the animator’s art: synchronized sound and a memorable character with a defined personality. But it’s also worth watching today for what it is conspicuously missing: there’s no color, no spoken dialogue, only the hint of three-dimensionality. The story, only seven minutes long, revolves entirely around simple visual gags. Steamboat Willie was closer to a flip-book animation with a grainy soundtrack attached to liven things up. Viewed next to Snow White, Willie seems like it belongs to another era altogether, like comparing Méliès’s A Trip to the Moon from 1902 with Orson Welles’s Citizen Kane, made almost forty years later. Disney managed to compress a comparable advance in complexity into nine short years.
And even Welles, for all his genius, relied on innovations that had been pioneered by other filmmakers before him: Griffith’s close-up; the sound synchronization introduced in The Jazz Singer; the dolly shot popularized by the Italian director Giovanni Pastrone. The great leap forward that Disney achieved with Snow White was propelled, almost exclusively, by imaginative breakthroughs inside the Disney studios. To produce his masterpiece, Disney and his team had to reinvent almost every tool that animators had hitherto used to create their illusions. The physics of early animation were laughably simplified; gravity played almost no role in the Felix the Cat or Steamboat Willie shorts that amused audiences in the 1920s. For Snow White, Disney wanted the entire animated world to play by the physical laws of the real world. Before Snow White, one animator recalled, “no one thought of clothing following through, sweeping out, and dropping a few frames later, which is what it does naturally.” Disney commissioned thousands of slow-motion photographic studies that the animators could analyze to mimic the micro-behaviors of muscles, hair, smoke, glass breaking, birds flying, and countless other physical movements that had to be re-created with pen and ink.
These artistic innovations required a new way to test visual experiments before committing them to final print. Disney’s team began sketching out ideas on cheap negative film that could be quickly processed and projected onto a tiny Moviola screen. They called these experimental trials “pencil tests.” The sheer length of Snow White required additional tools to map the overarching narrative; for that, the “storymen” on Disney’s team hit upon the idea of taking sketches corresponding to each major scene and pinning them to a large corkboard that let Disney and his collaborators take in the narrative in a single glance, inaugurating the tradition of “storyboarding” that would become a ubiquitous practice in Hollywood, for both live-action and animated films.
Sound and color also forced Disney and his team to conjure up new solutions. Creating the illusion of spoken dialogue emerging from a human character’s mouth required a level of synchronization and anatomical detail that early sound cartoons, like Steamboat Willie, had not required. While Disney had partnered with a new start-up called Technicolor to add a full palette to the final print of Snow White, the actual animation cels had to be painted in-house by the animation team. They ended up concocting a new kind of paint, using a gum arabic base that was “rewettable,” enabling the animators to fix small problems without tossing out the entire cel. Disney even purchased a cutting-edge tool called a spectraphotometer to measure color levels precisely, given the challenge of converting them into the less accurate Technicolor format.
The most impressive technical breakthrough behind Snow White was the multiplane camera that Disney and his team built to create the signature sense of visual depth that the film introduced to animation. Before Snow White, animated films lived in a two-dimensional world, with only a hint of depth provided by an occasional linear perspective trick borrowed from Brunelleschi. But mostly they looked like a series of drawings on white paper that had somehow come to life. Most animations did use semitransparent character and background cels, layered on top of each other, so that animators wouldn’t have to redraw the entire mise-en-scène for each frame. For Snow White, Disney hit upon the idea of multiple layers corresponding to different points in the virtual space of the movie, and separating those cels physically from each other while filming them: one layer for the characters in the foreground, one for a cottage behind them, another for the trees behind the cottage, and so on. By moving the position of the camera in tiny increments for each frame, a parallax effect could be simulated, creating an illusion of depth even more profound than the one Brunelleschi had invented five hundred years before.
All of these technical and procedural breakthroughs summed up to an artistic one: Snow White was the first animated film to feature both visual and emotional depth. It pulled at the heartstrings in a way that even live-action films had failed to do. This, more than anything, is why Snow White marks a milestone in the history of illusion. “No animated cartoon had ever looked like Snow White,” Disney’s biographer Neil Gabler writes, “and certainly none had packed its emotional wallop.” Before the film was shown to an audience, Disney and his team debated whether it might just be powerful enough to provoke tears—an implausible proposition given the shallow physical comedy that had governed every animated film to date. But when Snow White debuted at the Carthay Circle Theatre, near L.A.’s Hancock Park, on December 21, 1937, the celebrity audience was heard audibly sobbing during the final sequences where the dwarfs discover their poisoned princess and lay garlands of flowers on her. It was an experience that would be repeated a billion times over the decades to follow, but it happened there at the Carthay Circle first: a group of human beings gathered in a room and were moved to tears by hand-drawn static images flickering in the light.
In just nine years, Disney and his team had transformed a quaint illusion—the dancing mouse is whistling!—into an expressive form so vivid and realistic that it could bring people to tears. Disney and his team had created the ultimate illusion: fictional characters created by hand, etched onto celluloid, and projected at twenty-four frames per second, that were somehow so believably human that it was almost impossible not to feel empathy for them.
Those weeping spectators at the Snow White premiere signaled a fundamental change in the relationship between human beings and the illusions concocted to amuse them. Complexity theorists have a term for this kind of change in physical systems: phase transitions. Alter one property of a system—lowering the temperature of a cloud of steam, for instance—and for a while the changes are linear: the steam gets steadily cooler. But then, at a certain threshold point, a fundamental shift happens: below 212 degrees Fahrenheit, the gas becomes liquid water. That moment marks the phase transition: not just cooler steam, but something altogether different.
Twelve frames per second—the point at which the human eye begins to see motion in a series of static images—is the perceptual equivalent of the boundary between gas and liquid. When we crossed that boundary, something fundamentally different emerged: still images came to life. The power of twelve frames per second was so irresistible that it even worked with hand-drawn characters pulled from a storybook. But like the phase transitions of water, passing that threshold—and augmenting it with synchronized sound—unleashed other effects that were almost impossible to predict in advance. The consumers of illusion at the beginning of the nineteenth century wouldn’t be at all surprised to find that people two centuries later were gathering in dark rooms to be startled and surprised by special effects. But they would be surprised by something else in the culture: the enormous emotional investment that people have in the lives of other people they have never met, people who have done almost nothing of interest other than appear on a screen.
At twelve frames per second, with synchronized sound and close-ups, it is almost impossible for human beings not to form emotional connections with the people on-screen. (Disney made it clear that you didn’t even need actual people!) We naturally feel interest in the everyday ups and downs of our close friends and family. Twelve frames a second tricks the brain into feeling that same level of intimacy with people we will never meet in person, what the historian of modern celebrity Fred Inglis calls “knowability combined with distance.” When the tinkerers of the 1830s were exploiting persistence of vision to make a horse come to life in the circular motion of the zoetrope, it never occurred to them that the perceptual error they were exploiting would one day cause people to weep and bristle at the mundane actions of total strangers living thousands of miles from them. But that is the strange cognitive alchemy that twelve frames per second helped stir into being.
It is possible—maybe even likely—that a further twist awaits us. When Charles Babbage encountered an automaton of a ballerina as a child in the early 1800s, the “irresistible eyes” of the mechanism convinced him that there was something lifelike in the machine. Those robotic facial expressions would seem laughable to a modern viewer, but animatronics has made a great deal of progress since then. There may well be a comparable threshold in simulated emotion—via robotics or digital animation, or even the text chat of an AI like LaMDA—that makes it near impossible for humans not to form emotional bonds with a simulated being. We knew the dwarfs in Snow White were not real, but we couldn’t keep ourselves from weeping for their lost princess in sympathy with them. Imagine a world populated by machines or digital simulations that fill our lives with comparable illusion, only this time the virtual beings are not following a storyboard sketched out in Disney’s studios, but instead responding to the twists and turns and unmet emotional needs of our own lives. (The brilliant Spike Jonze film Her imagined this scenario using only a voice.) There is likely to be the equivalent of a Turing Test for artificial emotional intelligence: a machine real enough to elicit an emotional attachment. It may well be that the first simulated intelligence to trigger that connection will be some kind of voice-only assistant, a descendant of software like Alexa or Siri—only these assistants will have such fluid conversational skills and growing knowledge of our own individual needs and habits that we will find ourselves compelled to think of them as more than machines, just as we were compelled to think of those first movie stars as more than just flickering lights on a fabric screen. Once we pass that threshold, a bizarre new world may open up, a world where our lives are accompanied by simulated friends.
In a strange way, these virtual companions might be more authentic than the simulated friends of reality TV and celebrity culture; at least the robots or virtual beings will acknowledge your existence and engage directly with your shifting emotional states, unlike the Kardashians. The Phantasmagoria designers and automaton creators of the eighteenth century tapped the power of illusion to terrify or amuse us; their descendants in the twenty-first century may draw on the same tools to conjure up other feelings: empathy, companionship, even love.
[This post was adapted from my book Wonderland: How Play Made The Modern World. As some of you know, there’s an “ideal reader” tier for Adjacent Possible subscribers where the subscription comes with a signed copy of each new book I write, starting with my latest, Extra Life. But it occurred to me just now that if anyone would prefer a copy of Wonderland as part of the ideal reader package, I’d be happy to send one of those instead. (It’s the nicest physical object of all my books — beautiful design, paper quality, full-color images throughout.) Just sign up for the ideal reader tier and send me a note with your address.]
I just finished Ishiguro's Klara and the Sun, which explores the relationship of "artificial friends" to human emotions, from the point of view of the "machine." A vivid realization of the AI frontiers you mentioned at the end of the post.