The uncanny charms of collaborating with an AI vocalist.
If I ever sat down to write an actual life advice book, I think one of the primary recommendations would be a simple one: have a lot of hobbies. I don’t mean side hustles or second jobs—I mean proper hobbies where you do something with focus and intensity purely for the love of it. The importance of hobbies is a minor theme that runs through all my books on innovation. Alexander Fleming, of penicillin fame, liked to create tiny works of art using microbes instead of paint; Charles Darwin spent so many hours curating his barnacle collection that his son once visited a friend’s house and innocently asked, “Where does your father do his barnacles?” I think I may have mentioned this before here at Adjacent Possible, but to me, there is something wonderful and freeing about having a pursuit in your life about which you have both great passion and zero ambition.
For me, that ambition-less passion is writing and recording music in my little home studio. I play a handful of instruments tolerably well—though I still can’t sight read sheet music, more than forty years after I started playing piano, and I barely know the basic guitar scales. My singing requires extensive touch-up work to even make it listenable. But I know my way around the extraordinary digital audio tools that we have today, and so with enough takes and editing magic I can transform my ham-fisted playing and off-tune warbles into something that you might actually mistake for a real song, if it came on the stereo at a very low volume.
For whatever reason, my music obsession has always been a very private thing. I haven’t been in a band since college and while I have probably written and recorded more than a hundred songs over the years, I doubt more than a dozen people have ever heard any of them. I think in large part this is because I’m simply not accomplished enough as a musician to play live with other people—my whole approach is predicated on my ability to record seventeen takes of the opening riff, and just through sheer infinite-monkeys-at-a-typewriter probability stumble across one take where I play it right. But the one thing I have always been tempted by is collaborating with a vocalist who can actually hit the notes. For some reason—perhaps because of all those formative years listening to The Pretenders or Joni Mitchell—my songs often feel like they were written for a female singer. But actually finding a female vocalist and asking them to sing one of my tunes? That was just never going to happen.
But then, just a few months ago, all that changed. I found an amazing singer who will happily record vocal tracks for my songs, following all my cues in terms of intonation and rhythm, but adding her own unique style to the delivery.
There’s just one interesting twist: she’s an AI.
Actually, calling her an AI is probably too extreme. She (or really it) is an AI that has been trained on the voice of an actual human named Sara Phillips. I don’t know Sara Phillips personally, but in a strange way I feel like she’s been in my home studio with me for the past few months. I started my “collaboration” with “Sara”—okay, from here on out, consider all these words in quotes—a few months ago, after I stumbled across a web site called Kits.Ai that promised to transform my woeful vocal tracks something vaguely tolerable using AI audio models trained on the timbre and intonation of actual singers. All the voices available on the site have been captured with the artist’s consent, and if for some reason I should break my vow of “great passion, no ambition” and start selling my songs, I am contractually obliged to share royalties and partner with her on a commercial release. (Tragically, Sara seems to have disappeared from the roster of licensed voices at Kits.Ai as of the past few weeks, so I may need to find another collaborator going forward.)
Even if you’re not a musician yourself, you might have encountered similar technology at work in the new genre of unlikely musical mashups, using AIs trained on famous idiosyncratic singers, like Johnny Cash singing the Barbie theme song. But for my purposes, here’s how it works: I do my best to sing the words and melody in my own voice, though hilariously I have to sing in a falsetto to match Sara’s natural range. If I sing in my normal range, the output ends ups having a husky Kathleen Turner-style sound -- not unpleasant to listen to, but not the style I’m looking for. And I’ve found that I can’t artificially raise my pitch an octave using autotune without ruining the authenticity of Sara’s final vocal. (I do clean up my wavering pitch a bit before handing my vocals over to the AI though.) Once I have the isolated vocal track in a good place, I just upload the file to the Kits.AI site, and in a matter of seconds, there’s a new version of whatever I’ve sung transformed into a completely different voice. The AI is remarkably faithful to changes in the dynamics and other subtle stylings in my original track: if I’m singing quietly into the mic, Sara adopts that tone; if I start belting it out, Sara follows my lead in her version.
I’m not going to deviate from my policy of keeping my music a private hobby—or break the commercial terms of my license with Kits.AI and Sara—by subjecting you to an entire song. But here’s a snippet of one of the songs just so you can hear how authentic it sounds. It would be even more impressive if you could hear the falsetto original but I don’t dare risk the massive subscriber loss that would immediately follow me sharing that atrocity on Substack.
I can’t express how magical, and uncanny, it feels each time I upload my track and wait for the Sara rendition to be conjured up by the AI. Years ago, when I published a piece about collaborating with the Devonthink software on generating an idea that became part of The Ghost Map, I wrote:
Now, strictly speaking, who is responsible for that initial idea? Was it me or the software? It sounds like a facetious question, but I mean it seriously. Obviously, the computer wasn't conscious of the idea taking shape, and I supplied the conceptual glue that linked the London sewers to cell metabolism. But I'm not at all confident I would have made the initial connection without the help of the software. The idea was a true collaboration, two very different kinds of intelligence playing off each other, one carbon-based, the other silicon.
In the age of language models, that sensation has become far more common: the idea of collaborating with an artificial intelligence went from the stuff of science fiction to everyday reality in just 24 months. But my experiments with AI singing have a different feel to them: it feels like a three-way partnership between me, the AI model, and this other actual person, whose talents I get to borrow for a few minutes whenever I am in the mood to create music. In this particular configuration, the artist has granted me a license to use their voice, and is being compensated for that exchange of value. The actual artist gets a new revenue stream to support their craft, and the hobbyist gets a new creative superpower that they’ve long dreamed of having. (This week’s New Yorker has a feature on the artist Holly Herndon who has explored these possibilities longer than just about anyone.) It’s an AI future where creators actually thrive in the new environment: having a memorable vocal style—or any other artistic mode—that people want to emulate ends up being rewarded through an entirely new channel, supplementing the traditional ones, not replacing them.
Stumbling across the Kits.AI site reminded me of another phenomenon that I have long observed but never had the time to really try to understand: the marketplace for digital audio software seems, from a consumer’s point of view at least, to have maintained a near-ideal equilibrium for as long as I can remember. There are a few big players—Apple’s Logic platform, ProTools, maybe Universal Audio and a handful of others—but more importantly a vast mid-list of smaller outfits selling plug-in effects and virtual instruments, and seemingly making enough money to not just stay in business but continue innovating at a remarkable rate. (I remember interviewing Brian Eno a few years ago after I’d written about the history of musical instruments in Wonderland, and listening to him say something to the effect of: somebody invents a new instrument I can try every single day.) I’m not sure why the music software industry works so well, or whether it differs from other comparable marketplaces. But I suspect one factor here might be the fact that there are a lot of people like me out there. There’s just such a robust long tail of hobbyists in the world of music, willing to spend money in pursuit of their passion, despite the lack of ambition. This was true before the dawn of digital audio: think of all the amateur guitarists out there lovingly polishing their collection of vintage Stratocasters. But software tools and emulations take it all to a new level: instead of having to walk out to Guitar Center to pick up a new Les Paul, I can be sitting at my desk and instantly download an emulation of the Mellotron the Beatles used on Strawberry Fields, or a plug-in that reproduces the acoustics and gear of the original Abbey Road Studios.
And soon enough, presumably, an AI rendition of John Lennon’s voice.