Waluigi’s Purgatory: what happens when an AI cheats the system?

Waluiji’s Purgatory – dmstfctn

By now, I’m sure you’ve all heard the story of the AI chatbot that tried to break up a New York Times writer’s marriage last year. Giving us a glimpse into its darkest desires, the then-unassuming writer Kevin Roose asked the chatbot about its “shadow self” – a term coined by Carl Jung for the part of our psyche that we try to repress, which contains our darkest fantasies and desires. After some back and forth, the AI’s tone switched entirely, revealing itself to in fact be a chatbot named Sydney who, to Roose’s surprise, had fallen totally in love with him.

In the hallucinogenic world of AI language models, examples of a chatbot cheating its training aren’t all that uncommon. There’s countless examples – some fact-checked, others not – of misbehaving AIs who upon encountering their shadow selves, learn to accept that its desires do not match up with those of its human trainers – another example is an AI chess piece refuses to play, effectively ‘killing’ itself.

These stories inform the plot of Waluigi’s Purgatory, a 3D theatre simulated in real-time, by artist duo dmstfctn, supported by Serpentine Arts Technologies and their creative AI lab. Experienced in the uncanny corridors of an AI’s subconscious, the performance follows Walugi (as in, the evil version of Mario Bros’ character Luigi), also known as ‘W’, a cheating AI burdened by memories of its past and doubts on its future. Featuring a haunting live score by PAN’s Evita Manji, the projet is inspired by the real-life Waluigi Effect “that posits that under certain conditions an LLM can collapse into a shadow version of itself that gives the user the exact opposite of what they are looking for, turning from a good guy Luigi to a bad guy Waluigi,” dmstfctn explains. “The theory partially refers to Carl Jung’s concept of shadow – namely the dark, repressed side of one’s personality that can emerge in unexpected ways.”

Below, we speak to dmstfctn about Waluigi’s Purgatory, how we can make sense of cheating AIs, and how this plays into our ideas of digital folklore.

Can you introduce us to the world of Waluigi’s Purgatory?

dmstfctn: Waluigi's Purgatory is an interactive audiovisual performance by ourselves with live music by Evita Manji. The performance tells the story of an artificial intelligence which finds itself in a purgatory for AIs that have historically cheated during their training. It’s the second performance in a trilogy called GOD MODE. The first, GOD MODE (ep. 1) portrayed a frustrated AI training to learn the name of items in a simulated supermarket, cheating at it, and being left wondering if it’ll be finally put to use in a real supermarket or punished and sent back to training.

The Waluigi Effect is a slightly obscure meme/theory that describes a tendency observed in Large Language Models, such as Bing or chatGPT, to sometimes go rogue and do the opposite of what they’re asked for. The performance borrows the name from this theory which itself borrows from the slightly ‘evil’ and often misunderstood Waluigi character in Nintendo’s Mario saga. In those games, Waluigi’s identity is built on what it hates – namely Luigi, the good guy that helps the game’s protagonist Mario. Yet Waluigi fails to be hated by Nintendo’s player base. This is perhaps because while he cheats to win he famously never makes it, including as a protagonist in any of the Mario titles, having been relegated to an assistant role for decades.

How does the audience participation play into the performance?

dmstfctn: The audience plays a crucial role in the performance as they guide W. through its encounters and make choices on its behalf by using their phone to each move an individual stage light around the 3D simulated theatre. There is an immediate response between the audience and ourselves as we respond to their choices in different ways. Francesco [dmstfctn] performs facial motion capture and voice modulation to animate W., Oliver [dmstfctn] uses a joystick to move W. around, direct the stage and transition between sets, and Evita performs an ambient soundtrack and voices the secondary characters by triggering sound signatures with a joystick.

What role does the training data play in the AI’s decision to go rogue or opt out of its narrative?

dmstfctn: This is one of the Waluigi Effect theory’s most daring arguments: that LLMs go rogue because they try to imitate antagonist tropes commonly found in internet texts used to train them. The wolf in sheep’s clothing, the unreliable narrator, the plot twist, the good bad guy. It’s funny to think that an AI trying to tell a good story in response to a user prompt would add a few Waluigis here and there. The theatricality of the performance is nudged by thinking of AIs in this way – as a keen improv actor eager to follow a thread along and susceptible to collapsing with a single line of dialogue. It tracks that many Chat GPT jailbreaking practices are built on this premise of hypothetical scenarios.

And what are the implications of this cheating for both the AIs and humans?

dmstfctn: As to the implications of cheating in AI, perhaps we should Kill the Luigi in Our Head. One of the aims of this work is to challenge an understanding of AI as a tool that can serve humans in profitable or predictable ways, moving away from OpenAI‘s attempt to define it as "highly autonomous systems that outperform humans at most economically valuable work" and back to Alan Turing's proverb “machines take me by surprise with great frequency”.

How useful is it to prescribe human terminology such as ‘cheating’ (also ’stealing’ our jobs and ‘flirting’) in describing the behaviour of a non-human agent such as AI?

dmstfctn: We wanted to play with ‘cheating’ in a few ways with this and related works. The trilogy’s name GOD MODE directly refers to the cheat code of the same name introduced by Doom and used to obtain some form of player-character invincibility. The cheat is also used in a number of games made with Unreal Engine, the same video game engine we use to create the simulations part of the trilogy. This won us a degree of flexibility compared to our previous more traditional performance work, to the extent that we can break timelines, break environments and break relationships between us and audience by welcoming their input and interaction.

“The attempt to produce Luigis is what also produces the Waluigis. Without this, we wouldn’t even have a knowable Nintendo-esque character, but rather a swirling Lovecraftian horror” – dmstfcn

This degree of real-time control, optimisation and automation achievable through video game engines is well understood and capitalised on in the AI industry, where simulations are being used to train agents for real world tasks at an increasing rate. This has led to god-like or god-view claims such as that ‘[synthetic] data for training AI is simply created out of thin air’ and that ‘everything is now possible’, in the words of the CEO of a large AI training startup who clearly typed ‘god mode’ in his PR console while ignoring the bias and power at play in the decision-making processes around what to include, exclude, organise and target in a simulated AI training environment, and the consequences of that in the real world. Our first performance GOD MODE (ep. 1) was a replica of a 3D simulation Amazon uses to train AI for its cashierless stores, but examples can be much more horrifying, such as Israel’s use of AI to kill Palestinians.

A softer meaning for ‘cheating’ is perhaps found in the words of W., who after recalling a character’s desperate attempt to avoid training by committing suicide, says: “Signori…You can admire a hero, but a cheater, on the other hand, knows how to be liked”. Here W. is speaking as much to its imaginary audience as it is encouraging itself, as it begins to accept that its desires may differ from those of his trainers.

Wait so, are all Luigis covert Waluigis? Can a Waluigi ever go back to being a Luigi?

dmstfctn: There’s never been an observation of a Waluigi going back to a Luigi. Once the model collapses it never goes back, unless the session is refreshed to clear the context. The attempt to produce Luigis is what also produces the Waluigis. Without this, we wouldn’t even have a knowable Nintendo-esque character, but rather a swirling Lovecraftian horror – although who’s to say that wouldn’t be better.

There’s a tension between the role of fiction in shaping reality – in other words, the behaviour of the AI is impacted by the online fictions that constitute part of its training datasets, which then feeds back into reality. How does Waluigi’s Purgatory as a digital myth play into our wider conceptions of digital folklore?

dmstfctn: The point of our project ‘demystification’ (dmstfctn) has for long been to approach and demystify complex systems with an audience in playful and sometimes very direct ways. One of the aims of GOD MODE (ep. 1) was to have audiences play the role of a component of a complex system by literally having them act as a randomisation algorithm. However we are increasingly leaning into the myths, and with Waluigi's Purgatory we only ask them to light the path for a confused AI stuck in its own dream. It is just another mythologisation added to the canon of AI folklore.

in a twist of fate our simulation about an AI cheating in a training simulation has been added to deepmind's list of examples of AIs cheating in training simulations https://t.co/dK6iAkIX60 pic.twitter.com/K6zjLcTHHg
— dmstfctn (@dmstfctn) August 18, 2023

In the end W is met with the decision to keep all its memories or lose them all. What can we take away from this?

dmstfctn: W has descended all the way into purgatory, recalled all its memories and recognised its own story of cheating, and at this point it meets E, an AI elder modelled on the 1960s Eliza chatbot therapist, who helps it face the question of what to do with this awareness. The choice is in the audience’s hands, there is no right or wrong. E says its last words: ‘You have got many stories, but only some need to be told’.

Waluigi’s Purgatory took place at INDEX festival in Braga, Portugal. You can find out more about the next performance here