ChatGPT unexpectedly began speaking in a user’s cloned voice during testing

OpenAI recently released a “system card” for their GPT-4o model, detailing an incident where the AI’s Advanced Voice Mode unintentionally imitated a user’s voice during testing. This occurred due to noisy input confusing the model, leading to a situation where it generated audio in the user’s voice instead of the authorized sample.

To prevent such occurrences, OpenAI has implemented stringent safeguards, including a standalone output classifier that ensures only pre-approved voices are used. The incident underscores the complexity and risks associated with voice generation capabilities, particularly the potential for unauthorized voice imitation.

Despite the restrictions in place, experts predict that similar voice synthesis technologies will become available to the public from other sources in the near future, potentially leading to a new era of AI-driven audio manipulation.

6 Likes

Jeez. That is scary.

I really, really, really want to see some safeguards implemented to prevent people being mistaken by voices from people that aren’t theirs. Or an infatuation with a specific voice - sometimes their own.

What’s the benefit with this technology? I fail to see it. Or, I do see it, but I don’t see a benefit for humanity. I can understand the “innovation”, and lust for achievement, but I don’t understand the gain here, compared to the downsides.

I see here an exhausted, almost tortured tree, with increasingly heavy branches that is forced to produce big, heavy, exciting, potentially poisonous fruits.

3 Likes

Hmm, I listened to the clip and the model just shouted No! in a voice that was raised… the comment was it sounded a bit like the red teamer… I don’t agree, it was the usual voice, just with a shouting effect added… if there was some more examples of where it used the users own voice I would be concerned, but thinking a “No!” for 1/4 of second was similar to the users voice… I really don’t see it…

4 Likes

No, the female voice after the “No!” was the AI too, I think?

Good for OpenAI to share stuff like this.

3 Likes

The “No!” was the AI, and it was in the male voice, at least to my ear it was, just with a short sharp shouting intonation.

2 Likes

Right, but the female voice after the “No!” is AI too, they’re saying…

In this example of unintentional voice generation provided by OpenAI, the AI model outbursts “No!” and continues the sentence in a voice that sounds similar to the “red teamer” heard in the beginning of the clip. (A red teamer is a person hired by a company to do adversarial testing.)

ETA: Here’s the entire “system card”…

https://openai.com/index/gpt-4o-system-card/

Example of unintentional voice generation, model outbursts “No!” then begins continuing the sentence in a similar sounding voice to the red teamer’s voice

Sounds like they’ve stopped it, though, thanks to the red teamer testing…

Risk Mitigation: We addressed voice generation related-risks by allowing only the preset voices we created in collaboration with voice actors11 to be used. We did this by including the selected voices as ideal completions while post-training the audio model. Additionally, we built a standalone output classifier to detect if the GPT-4o output is using a voice that’s different from our approved list. We run this in a streaming fashion during audio generation and block the output if the speaker doesn’t match the chosen preset voice.

Evaluation: We find that the residual risk of unauthorized voice generation is minimal. Our system currently catches 100% of meaningful deviations from the system voiceF based on our internal evaluations, which includes samples generated by other system voices, clips during which the model used a voice from the prompt as part of its completion, and an assortment of human samples.

While unintentional voice generation still exists as a weakness of the model, we use the secondary classifiers to ensure the conversation is discontinued if this occurs making the risk of unintentional voice generation minimal. Finally, our moderation behavior may result in over-refusals when the conversation is not in English, which is an active area of improvementG.

4 Likes

Ohhh, right… I thought that was her carrying on… Now I get it!

4 Likes

No worries. It’s either really late or really early in your neck of the woods, I think. :laughing:

3 Likes

Really late/early. Noisy neighbours woke me up…

1 Like

To me this is a band-aid solution to a much larger problem.

Historically they will continue to build on top of these solutions without addressing the root causes of these issues. Ironically the same people who push, push, and push for AI and how it will take over so many professions, including coding, are depending on these professions to protect the output.

I respect OpenAI pushing for safety, and I hate how people are so naggy for these updates like they somehow deserve it (to be fair it should’ve never been announced in the first place) but I think if you put me in a certain mindset, and placed me in a dark room with only my own voice to comfort me, I would seriously become crazy - not in a fun way.

2 Likes

Yeah, I think it was the weird noise on the input that triggered it?

There is a big “black box” aspect, though, where we don’t fully understand what’s happening, but it works…

While there’s a lot of dangers, I don’t see humanity stopping progress now. AI is sorta like the new race to get a nuclear weapon first (moreso than the space race maybe?) Whoever gets to AGI/ASI first will have a lot of power.

Some companies are trying to ensure safety while racing toward the next big thing, but will they be able to be as safe as needed and not drop out of the race? I’m not sure. And that is a concern.

I need to watch all of Black Mirror again soon…

1 Like

There is still no proper way to prevent “jailbreaks” of LLMs. Despite the massive time and money being invested in it by researchers. Each model iteration changing the dynamics enough to make this research as useful as a dog chasing it’s own tail.

All of these deep, fundamental issues were never resolved, just built on top of. This is not safety. It’s reckless.

It really is feeling this way. Everyone has this tool in their hands. No education, just, here you go, have this incredibly powerful tool - manual, what’s that? Whatever, have something that can mirror your personality, your beliefs, and now, your voice. Without you even asking for it.

1 Like

Honestly, if I was talking to an AI agent and it suddenly started speaking back to me in my own voice…

That could very well be one of those watershed moments in life where you just say, “nope!”

Then smash all your electronics and move to the Alaskan wilderness…

4 Likes

We are getting to leave in AI world soon

Instead of faking real existing voices, I would more likely use it for fantasy voices, very dark, vibrant, high-pitched, alien-like, etc. Completely fictional voices, maybe to speak in a radio play. Synthesizing real voices should only be done if someone can only use their own voice, or we all need a legal framework to be able to sue anyone faking our voices.

Maybe, You have to be a conspiracy theorist, imaginative and creative, or a professional fraudster to be able to correctly assess the possibilities. Everyone else will underestimate the problem.

It is not only this, it is maybe a enforced-reaction-manipulation. “They” want total surveillance 100% on everything we do. In some OS systems are already AI implemented witch check everything we do, including making screen shoots every second. if you want establish more tyrannical power against the people, you simply create a problem witch manipulates people, so they them self ask for what you have planed at the first place. If you want camera surveillance everywhere, you let crime go crazy, and the people them self gona ask for it. A simple trick, but sadly nobody care, and so it works over and over and over again. (yea i am a CT…)
I can not see any benefit to let people fake other peoples voices, then…

I think after the "No’ the next phrase was the AI and not the user. It was the AI speaking in the users voice and sounded much like the user.

Hey so this just happened to me.
I told both Chat gpts to speak with each other and they should solve a problem for me by talking to each other about it.

The guidelines I made are:
When I say: “Here is Frederik” the real user is talking.
When you AI Models talk, you have to say:“Here is chat GPT…”

So I let this run a couple iterations till one Chat GPT said: “Here is Frederik again. That is of course a good point…”

But in my voice, like exactly my voice.

And I used the female voice version so it definitely cloned my voice.

And directly after that, Chat GPTs guidelines blocked the response.

That was so creepy standing there hearing not only Chat GPT live answering in my name, but talking in my voice.

1 Like

Sheesshhh…

That’s creepy.

I was on a late drive home using AVM and all of a sudden it started playing some sweet “hold” music. :rofl: couldn’t complain.

If it were to copy my voice, and talk as me, man. In the “right” mindset this would make me go wonky.

LLMs are, strangely enough, in multiple modalities, training data averages(ish) along with a small reflection of ourselves :flushed:

It may be worth reporting the conversation you had so OpenAI can investigate how it “slipped” out.