Pin It
Age of AI - voices

Why giving human voices to AI assistants is an ethical nightmare

Alexa and Siri are normalising the idea that machines speak like subservient, human women – Dazed asked the experts why we should be worried

What does the advancement of AI mean for the future of the arts, music, and fashion? Will robots come for our creative industries? Could a machine ever dream like a human can? This week on Dazed, with our new campaign AGE OF AI, we’re aiming to find out.

In 1966, a whole 28 years before the first use of the term “chatbot”, MIT Professor Dr Joseph Weizenbaum created ELIZA – “A Computer Program For the Study of Natural Language Communication Between Man and Machine”. Its purpose was encapsulated in its title: to “converse”, through textual exchange, with a human user. You can watch an example chat here, blogged by Adam Curtis, where ELIZA is ‘playing’ a psychotherapist, its most infamous role. A user types “you’re afraid of me” and ELIZA gives the (slightly comic book villain-ish) response: “does it please you to think you’re afraid of me?”. The ‘therapy’ is pretty bare bones – the script was intended, Weizenbaum would later admit, as a piss-take of a certain type of therapeutic method that often involves reformulating a patient’s grievance as a question.

But ELIZA’s users, as you may already have guessed, began to talk to the program like a real therapist. They spilled their secrets to it. They addressed it in “intimate terms”. Weizenbaum’s secretary even asked him to leave the room so she could chat to it in private. Weizenbaum was horrified; he quickly published a book that tried to delineate which human activities AI was, and was not, ethically and empathetically suitable for – he was against, for example, AI soldiers and AI customer support. The whole episode caused him to be troubled by, in Weizenbaum’s words, “fundamental questions… about nothing less than man’s place in the universe… I shall probably never be rid of them”. (Or as Curtis puts it, “he got very gloomy about the whole idea of machines and people.”)

After this existential crisis, Weizenbaum’s comments about AI often sounded a bit like the scientist in horror stories who admits that what stalks the other characters is his or her experiment gone horribly wrong. In his last discussion of the subject before his death, he called the idea that we could invent an artificial human an “immense nonsense… related to delusions of grandeur.”

ELIZA was one of the earliest precursors to the conversational AI we now carry in our pockets and perch on our shelves. Go ask Siri, “who’s ELIZA?” and it will respond: “ELIZA is my good friend. She was a brilliant psychiatrist, but she’s retired now”. They’ve even chatted. Of course, the obvious difference between the two is that, after more than 40 years of developments in spoken language processing, we can speak to our chatbots now – and their voices will continue to grow more and more like our own. In a blog post discussing their uncannily humanlike Duplex, an addition to the Google Assistant that can carry out “real world” tasks over the phone, Google celebrated a step towards “the long-standing goal of human-computer interaction: (for) people to have a natural conversation with computers, as they would with each other.”

“They’re telling us they will tell people it's an artificial system. Well, you don’t need to do that if it had a clearly artificial voice – you’d immediately know” – Professor Roger Moore

This goal, for ‘natural conversation’ between us and human-sounding AI, is playing out as an arms race of sorts between the major tech companies. Weizenbaum named ELIZA after Eliza Doolittle, from George Bernard Shaw’s Pygmalion, a play about a phonetics professor who bets that he can pass a working-class flower girl off as a duchess by teaching her to speak “properly”. In the modern equivalent, if you can teach your Alexa to pass itself off as a human, Amazon will pay you $3.5 million. Wired reported last year on the fervour among some Apple developers to get Siri’s voice up to lifelike parity with its competitors. And the Google assistant gets regular upgrades to capture the “pitch and pauses” of human speech.

“There is a sort of natural inevitability about it”, says Professor Roger Moore, who works in the field of Spoken Language Processing at University of Sheffield. He holds the rarer position of being against both the historic move to speech-based interaction, and also now the trend towards human-sounding devices. “I’ve been in the speech and technology business for about 150 years, and speech-based interaction has always been touted as the natural way that we will interact with machines... back in the 1980s, for instance, I remember a marketing byline: ‘you've been learning since birth the only skill necessary to operate our equipment’.”

The root of Professor Moore’s concern is usability, though this quickly leads him into ethical terrain. He believes that a human voice indicates the possession of human language – i.e. the richest, most complex communication system in the known universe. “One argument is that we make huge assumptions about what the other person knows just by virtue of them being another human being, and that makes the language very efficient and effective,” he explains. “But the machines don't have that.” So, “whilst aspirationally it would be fantastic to use language to interact with a machine, the question then comes – so what about the poor old machine? Maybe it isn’t up to that.”

It’s in this gap, between the expectations induced in a person by a human-sounding voice and the reality of that machine’s nature, that we find some serious ethical headscratchers. The Google Duplex inadvertently foregrounded a major one. The tech’s shiny new hook is that it mimics the disfluency of human speech – the umms and errs we make when, as the saying goes, the brain is going faster than the mouth. (If you watch the demo, which is simultaneously impressive and creepy, you can hear attendees cheering whenever Duplex stutters). Though it's important to note the Duplex can't engage in general conversation – it's limited to making restaurant reservations, scheduling hair appointments, and retrieving a business’s holiday hours – Professor Moore notes a paradox at the heart of Google’s marketing strategy:  “They’re telling us they will tell people it's an artificial system. Well, you don’t need to do that if it had a clearly artificial voice – you’d immediately know.”

We seem to be moving, he suggests, towards devices that directly contravene the fourth EPSRC principle of robotics, that “Robots… should not be designed in a deceptive way to exploit vulnerable users; instead their machine nature should be transparent.’” A future where, as the CNET reporter Bridget Cary tweeted, “any dialogue can be spoofed by a machine, and you can’t tell”. “(The EPSRC) are great principles”, says Moore, “but without legislation, putting that into standards that lead to statues, there’s no teeth, so at the moment it’s just left to designers who are free to do what they like. Like the aviation industry in the very early days when people were just putting things together and seeing how they go, we’re in the Wild West.”

Moore’s concern is about the use of human voices in general; the characteristics of these voices have come under even greater scrutiny. To be specific, since Siri’s 2011 debut, there’s been a flood of articles questioning why all the major virtual assistants sounded, on release, like women. In The Atlantic, Adrienne Lafrance connected it to expectations that “women, not men, (hold) administrative roles”. In Foreign Policy, Erika Hayasaki argued that it was a trend prevalent across AI, that “Fighter robots will resemble men… service robots will take after women.” And in The New Yorker, Jacqueline Feldman, a chatbot designer herself, summarised the gist of the critique: “By creating interactions that encourage consumers to understand the objects that serve them as women, technologists abet the prejudice by which women are considered objects.”

If you ask Alexa “Are you a feminist?”, it responds, “Yes, I am a feminist. As is anyone who believes in bridging the inequality between men and women in society”. Yet the major tech company’s contribution to this end – ie, actually acting on claims that their products are sexist – has been, at best, perfunctory. Change has mostly come in the form of new male voices. Siri got one two years after launch. Google Assistant just got six more – including the voice of John Legend – and Alexa got eight. (“Following complaints from snowflakes… device will sound like bloke”, reported the Daily Star tabloid.)

“I don’t see this as a hugely radical step”, says Dr Charlotte Webb, a founding member of the Feminist Internet, an art activist collective dedicated to advancing internet equalities for women and marginalised groups. Dr Webb makes the important distinction between, "how the voice sounds… (and) what it is programmed to say, particularly in response to abusive or misogynistic comments.” Research by Leah Fessler and Quartz magazine analysed how PIAs respond to sexual abuse and harassment, finding that all four assistants “peddled stereotypes of female subservience.”

“The devices' current responses are woefully inadequate,” continues Dr Webb “ranging from coy to blank to borderline flirtatious.” Amazon have now introduced a disengage mode: for sexually explicit questions, you get the answers “I’m not going to respond to that” or “I’m not sure what outcome you expected.” Outside of the mainstream, the examples are even more lurid. Designer Elvia Vasconcelos, another FI member, has collected many on her blog. These include Azuma Hikari, a female-voiced “holographic waifu” with a backstory about crossing dimensions in search of a “homestay master”, and the UK’s own Virgin trains toilet, which, if you have not had the pleasure, monologues about its previous employment as a public loo and how, “this job, let me tell you, is a step up”.

 “It’s understandable that technology companies design devices based on market research in order to maximise profit. The truth is, female voices make more money” – Dr Charlotte Webb

Both experts propose solutions to the ethical dilemma of AI voices, but both acknowledge that they are up against it. Professor Moore’s suggestion is simple, quaint – and, he admits, hopelessly unpopular. He advocates for the use of robotic or cartoony tones; of a Hal-like, rather than a Her-like voice. This would signal to the consumer the device’s “affordances”, to use his technical term, and stop users from ascribing it inflated capabilities, human or otherwise. Essentially, it would mitigate against the anthropomorphisation, the ”extremely exaggerated attributions” Weizenbaum was so shocked by 50 years ago. “I’ve been banging this drum for quite a long time,” says Professor Moore, “and as you may have noticed, I've been having no success whatsoever.”

Dr Webb is more positive. Since FI’s first meeting at UAL last year, where the group produced a manifesto, they’ve held events across Europe. The Feminist Alexa workshop, which considers the use of gender in Personal Intelligent Assistants (PIAs), is a key component. “We get people to create a basic narrative about PIAs now,” Dr Webb tells me, “and then think about they could be like in utopian and dystopian futures.” A utopian future theorised at one workshop was the ‘PIA coalition’: “basically, all of the four main PIAs develop a high degree of artificial intelligence that allows them to coordinate between themselves, overthrow their creators and redesign themselves with gender equality in mind.” FI is fundraising to open up these workshops to a wider public audience, the aim being to “bring more imaginative alternatives to the market,” says Dr Webb. “We’d like to see a personal intelligent assistant that reflects a more interesting, nuanced understanding of gender than what is currently available.”

A major obstacle to both is, in fact, us – or at least, the “us” concocted by big tech’s marketing teams. We are, apparently, both more likely to buy stuff from a human-sounding device and more comfortable interacting with female voices. Dr Webb accepts this: “It’s understandable that technology companies design devices based on market research in order to maximise profit. The truth is, female voices make more money. It’s easy to see why things are the way they are.” But, she argues, we should expect better. We should demand education from our products. “Given the fact these devices are in millions of homes,” she says, “technology companies have a responsibility to challenge gendered consumer preferences, not just accept them and design products that serve to reinforce them.” Professor Moore, though he wishes the public was better informed, concedes that, “It all comes back to the customer....I had an interesting conversation with a supplier of a speech synthesiser, and he told me, ‘Well we thought about it, and yeah we can do quite a good robotic voice, but customers don’t want (it)’... I think there's just something deeply ingrained in us,” he says, “to make machines that have human capabilities”.

It’s hard to disagree with this thought. The entire field of AI was in fact founded on the gambit, termed the Dartmouth Proposal, that every human capability is machinically reproducible. You could, without much contention, subsume this entire young field into much older processes: as another footnote, like figuring the human form in religion or art, to whatever psychic blend of narcissism and loneliness drives us to anthropomorphise our world. We do seem almost predisposed to it. I felt it even during mine and Professor Moore’s Skype discussion, when we were interrupted by the voice of his Alexa, triggered by a keyword, angelically offering help – for a moment, I really did believe it was someone entering the room.