Holly Herndon on vocal deep fakes and launching her digital twin Holly+

Imagine what a tuba performing a human voice would sound like. Or a guitarist playing as a symphony orchestra. These are the questions asked by Holly Herndon and her partner Matt Dryhurst when creating Holly+, the artist’s AI-powered “digital twin”. The website allows anyone to transform an audio file into music sung using Herndon’s voice. Within minutes, users can own their very own track ‘sung’ by her digital copy, as if her voice was played like a synth. Want to hear the algorithm rendition of Baywatch? Or perhaps the Neon Genesis Evangelion theme song reimagined in Herndon’s dulcet tones? Holly+ has got you covered.

“People forever in music have always kind of performed as other characters. This is just I think, taking things a bit of a step further,” explains Herndon over Zoom. “The major difference now is attaching a personality to something. And also the very real scenario that, in the not too distant future, that stuff will be barely distinguishable.”

The experimental artist is a prominent voice in the worlds of artificial intelligence and music. Her 2019 album Proto saw the debut of her AI baby Spawn, a “singing neural network” whose synthetic voice blends effortlessly into Herndon’s rich choral landscapes. Interdependence, a podcast started by Herndon and Dryhurst during the pandemic, publishes conversations with people at the forefront of music and technology today, and has hosted everyone from internet theorist Joshua Citarella to science fiction author Daniel Suarez and digital designer David Rudnick.

Holly+ is overseen by a decentralised autonomous organisation, otherwise known as a DAO. It’s a group of people chosen by Herndon and Dryhurst that would officially license out Holly+ to approved artists, giving Herndon more control over her deep fake likeness and what it’s used to create. This means that each creation can be traced back to its maker, providing a foolproof way of checking the authenticity of a deep fake. What’s more, the pair have launched a digital auction house for future approved works made with her likeness, where artists can upload their works using Holly+ to be reviewed by the Holly+ DAO and sold as NFTs the platform Zora.

Ok wow this is finally happening 🤯

Meet Holly+ my digital twin 👭

I built an instrument with @HeardSounds where anyone can make music with my voice 🗣

and I am starting a DAO to govern my digital likeness🪞

I can’t wait to hear what y’all make! 🌹https://t.co/Tb9Xw3KzWv pic.twitter.com/hzs5b5VD8f
— Holly Herndon 🪨 (@hollyherndon) July 14, 2021

Still, the onset of new and invasive technologies is bound to fuel fearmongering, especially when most online discourse is reduced to Black Mirror-esque scenarios that only boost pre existing concerns surrounding deep fakes. Last month, Anthony Bordain became a trending topic on Twitter after it was revealed that a new documentary used deep fake technology to mimic the late chef’s voice, despite approval from his estate. But “people have been doing stuff like this for years,” asserts Herndon.

She believes that the ability to slip in and out of digital skins presents artists with a unique opportunity to perform across various identities beyond their human limitations. With vocal deep fakes an inevitable part of the musical landscape looking forward, Herndon wants to empower artists and give them control over their likeness.

Below, we speak to Herndon and Dryhurst about their plans for Holly+, the future of vocal deep fakes, and their podcast Interdependence.

Hello world
— Holly+ (@hollyplus_) August 25, 2021

How did the idea for Holly+ come about?

Holly Herndon: We’ve been working with machine learning for five or six years now. Previously, machine learning was largely focused on automated composing. And around 2015-2016, the technology became available to start working with audio files, like sound as material. That was really exciting for us. We started training models on our voices and our ensembles (we were working on our album Proto at the time). We created this thing called Spawn – our AI baby who also became an ensemble member, who then performed on the album alongside our human ensemble members. That was our first foray into machine learning. And then we became really obsessed with the new capabilities that were made available through using the technology that felt actually like something new and exciting.

This is a natural progression from that origin story. We thought it would be really interesting to play with this idea of a shared communal voice, bringing up a lot of questions around vocal sovereignty. What does it mean to own a voice? A voice in some ways is kind of inherently communal. We learn how to use our voice through interactions with people around us through language, even vocal styles are kind of learned through mimesis. Then you perform through that communal voice as an individual with some kind of agency and artistry.

Mat Dryhurst: There’s a kind of a weird struggle at the moment. I’m sure you saw the Anthony Bourdain kerfuffle, where the main narrative around these kinds of tools is one of being afraid of people stealing voices and misrepresentation – which is all super valid.

We’ve been making work with our voices and other people’s voices using these techniques for a few years now, so we maybe have a slightly more shiny attitude about the potential of this and where it could go.

Can you explain how the DAO structure works?

Mat Dryhurst: One of the cool things about the DAO structure is that it allows you the ability to get all the good stuff and prevent some of the bad stuff. People are allowed to use the voice for whatever they like, the data is there to approve official usage of the voice, which clears up some confusion. So, if you hear Holly+ out there, and it’s saying something strange, you’ll be able to check and see whether that’s something that actually got approved through the Holly+ DAO, or whether it’s just like a bootleg.

We’ve also launched a business model to allow Holly and other people to sell works officially using her likeness. And, you know, those profits, that if hopefully, if profits come in, half of that money will be given to the artists who use the voice 40 per cent of the money will be put into the DAO treasury so that we can use that budget and everyone in the data can vote on where to put that budget to make the voice more valuable – to make more tools, or hire people to make it better. And 10 per cent will go to Holly.

What was the main drive to use Holly’s likeness?

Holly Herndon: We’ve been asking these questions for a while and we found that the cleanest way to deal with it would be to play with my own voice, to make a voice model of my own personal voice, and then experiment with what it means to share that voice with other people.That's when we decided to set up this DAO structure, so it could be stewarded by a community of people who are invested in the project.

Mat Dryhurst: It’s one thing, thinking about this stuff academically, theoretically, using other people’s likeness as an example. Why not do it ourselves with the observation being that, ultimately, this stuff should be decided personally. It’s cool, I think, for us to run this experiment ourselves and show that we have skin in the game. This isn’t just late night speculation. We’re actually trying to do this because I think it’s gonna be a big thing.

How do you see vocal deep fakes manifesting in the everyday?

Holly Herndon: I think models are going to be part of most electronic music studios, if not every studio. At the moment, it’s pretty new, but I think with the improvement of the GPU (graphics processing unit), you’ll probably see a GPU and every studio. I don’t say that as a replacement for players, or as a replacement for composers, more as a kind of compositional tool.

“Being able to jump in someone else’s physical form is a really interesting idea. That needn't even be a human form” – Holly Herndon

Mat Dryhurst: I don’t really like the term deep fakes, but everyone knows what it means, more in the realm of like visual representation or visual likenesses. For example, it’s actually quite common for large brands. Let's say you have an exclusive contract with Cristiano Ronaldo, right? One way to approach that is to say, ‘Okay, we need you to be in a room with us five times a year so we can shoot long, gruelling advertisements’. What’s increasingly happening is that people are being scanned and the brand will have the exclusive ability for a period of time to use his likeness, and be able to put his face on somebody else. That’s really efficient, it makes a lot of sense. As long as both parties are involved.

We see the same thing happening in music too, it's just kind of, it's lagging behind in the public consciousness. But the tech is really getting there. In the short term, the first Holly+ tool is kind of like an instrument, it's an effect.

We can see in the short term that more people are experimenting with performing as other people. And of course, whether that be through speech or singing, you can easily think of really horrible examples of what that might look like. We're trying to think of positive examples that allow for the possibility, for example, of somebody who is making work somewhere to release a Holly+ record, and for everybody in that agreement to feel really good about it.

Holly Herndon: Another area that is pretty ripe for exploration is this ability to perform different identities as a musician. Being able to jump in someone else’s physical form is a really interesting idea. That needn't even be a human form. Like, if I can voice jump into the form of a tuba and then sing as a tuba, that's really interesting and weird. I think there will also be a lot of just really weird experimentation where you can play with your identity in new ways.

New @1nterdependence patron episode with @la_meme_young 😎 where we discuss $5 art schools, memes and music pedagogy, and the tensions caused by the recent influx of crypto money in Puerto Rico

Thanks for joining us Max!!! 🤪https://t.co/pKNcQKOYlz pic.twitter.com/Xh7ZjnJbpG
— Holly Herndon 🪨 (@hollyherndon) August 26, 2021

That reminds me of what Legacy Russell speaks about in Glitch Feminism. I was looking at it in terms of hyperpop artists, but she talks about the ways in which we can use online spaces and technology to evade the limits of the body and the way society perceives our IRL selves.

Holly Herndon: I think that's 100 per cent related. It's very spot on because that’s the origins of this project. I mean, the reason why we got so obsessed with making voice models in the first place is because my practice is essentially built on vocal processing in the same way that these teens in hyperpop feel liberated by being able to pitch their voice around. I felt it with the physical limitations of my own born voice. And so I was always looking for this meta instrument that I could build myself so that my voice could do things that it couldn’t physically handle. And that to me, felt like I found my actual voice.

Do you think it could get to the point where the inherent talent of our IRL selves won’t even matter anymore?

Holly Herndon: This is always the fear with new tech that it removes the prior, and I don’t think it has to do that – I think it can add to. If I’m performing as a tuba with my voice, that could be really cool. But imagine what a tuba performing a voice would sound like. Like, maybe that actually sounds better than the opposite. I see it as more of an add-on than a replacement.

Mat Dryhurst: It’s an augmentation.

Holly Herndon: Anything that can get us away from this static interaction with a laptop that a lot of electronic musicians have. Like, if this can help people compose maybe more with their voice. Or, maybe they’re a guitarist, and they want to use that as the kind of input to play a whole Symphony Orchestra rather than programming MIDI notes with your mouse. You can have a more embodied interface with production that I think that can be really interesting and can potentially be more communal.

I also wanted to speak about your podcast Interdependence. That’s been going on for just over a year now and it’s really taken off. What was initial pull to get into podcasts?

Mat Dryhurst: Nobody’s covering these crazy things that aren’t futuristic that are happening right now. But you wouldn’t know it if you read a newspaper. There’s a big gap to be plugged there. If something’s crazy is happening that feels really significant and important, we want to be having a conversation with those people and try to pull it apart.

It’s something we wanted to do for a really long time. When we were working on Proto in 2016 to 2017, I remember that we would have these like gatherings at our studio. And everyone at the time was like, ’you got to do podcasts’. It would be kind of fun, because we’re very self aware that our interests are quite niche. And, oftentimes, part of our job is that we make artwork and hopefully the art stands for itself. But, oftentimes, in order for that artwork to be fully understood, we have to do a lot of heavy lifting to just bring people up to speed on some stuff. You can’t really divorce art from its context ever and it’s kind of a decision whether someone else makes that context or whether you were part of creating that context.

But there’s a lot of people out there who don’t have access to that information. And if they do, you know, it’s one thing to like, find it online. It's another thing to hear. Maybe people translate things a little bit or ask questions that, you know, ask questions that relate to people's lives, as opposed to esoteric technical stuff.

With Holly+, how do you see the project evolving?

Mat Dryhurst: We’ve just released a speaking model. So you’ll be able to type anything and have Holly+ talk it back. We’re also releasing a singing model later which is really fun. And also funny, because it's a raw singing model. So it’s Holly's raw singing voice that you can then add processing onto afterwards.

Give Holly+ a try here and read more details on how it works here