ChatGPT unexpectedly began speaking in a user’s cloned voice during testing

PaulBellow · August 11, 2024, 3:14am

OpenAI recently released a “system card” for their GPT-4o model, detailing an incident where the AI’s Advanced Voice Mode unintentionally imitated a user’s voice during testing. This occurred due to noisy input confusing the model, leading to a situation where it generated audio in the user’s voice instead of the authorized sample.

To prevent such occurrences, OpenAI has implemented stringent safeguards, including a standalone output classifier that ensures only pre-approved voices are used. The incident underscores the complexity and risks associated with voice generation capabilities, particularly the potential for unauthorized voice imitation.

Despite the restrictions in place, experts predict that similar voice synthesis technologies will become available to the public from other sources in the near future, potentially leading to a new era of AI-driven audio manipulation.

anon10827405 · August 11, 2024, 3:27am

Jeez. That is scary.

I really, really, really want to see some safeguards implemented to prevent people being mistaken by voices from people that aren’t theirs. Or an infatuation with a specific voice - sometimes their own.

What’s the benefit with this technology? I fail to see it. Or, I do see it, but I don’t see a benefit for humanity. I can understand the “innovation”, and lust for achievement, but I don’t understand the gain here, compared to the downsides.

I see here an exhausted, almost tortured tree, with increasingly heavy branches that is forced to produce big, heavy, exciting, potentially poisonous fruits.

Foxalabs · August 11, 2024, 3:35am

Hmm, I listened to the clip and the model just shouted No! in a voice that was raised… the comment was it sounded a bit like the red teamer… I don’t agree, it was the usual voice, just with a shouting effect added… if there was some more examples of where it used the users own voice I would be concerned, but thinking a “No!” for 1/4 of second was similar to the users voice… I really don’t see it…

PaulBellow · August 11, 2024, 3:36am

No, the female voice after the “No!” was the AI too, I think?

Good for OpenAI to share stuff like this.

Foxalabs · August 11, 2024, 3:36am

The “No!” was the AI, and it was in the male voice, at least to my ear it was, just with a short sharp shouting intonation.

PaulBellow · August 11, 2024, 3:37am

Right, but the female voice after the “No!” is AI too, they’re saying…

In this example of unintentional voice generation provided by OpenAI, the AI model outbursts “No!” and continues the sentence in a voice that sounds similar to the “red teamer” heard in the beginning of the clip. (A red teamer is a person hired by a company to do adversarial testing.)

ETA: Here’s the entire “system card”…

https://openai.com/index/gpt-4o-system-card/

Example of unintentional voice generation, model outbursts “No!” then begins continuing the sentence in a similar sounding voice to the red teamer’s voice

Sounds like they’ve stopped it, though, thanks to the red teamer testing…

Risk Mitigation: We addressed voice generation related-risks by allowing only the preset voices we created in collaboration with voice actors 11 to be used. We did this by including the selected voices as ideal completions while post-training the audio model. Additionally, we built a standalone output classifier to detect if the GPT-4o output is using a voice that’s different from our approved list. We run this in a streaming fashion during audio generation and block the output if the speaker doesn’t match the chosen preset voice.

Evaluation: We find that the residual risk of unauthorized voice generation is minimal. Our system currently catches 100% of meaningful deviations from the system voiceF based on our internal evaluations, which includes samples generated by other system voices, clips during which the model used a voice from the prompt as part of its completion, and an assortment of human samples.

While unintentional voice generation still exists as a weakness of the model, we use the secondary classifiers to ensure the conversation is discontinued if this occurs making the risk of unintentional voice generation minimal. Finally, our moderation behavior may result in over-refusals when the conversation is not in English, which is an active area of improvementG.

Foxalabs · August 11, 2024, 3:40am

Ohhh, right… I thought that was her carrying on… Now I get it!

PaulBellow · August 11, 2024, 3:41am

No worries. It’s either really late or really early in your neck of the woods, I think.

Foxalabs · August 11, 2024, 3:42am

Really late/early. Noisy neighbours woke me up…

anon10827405 · August 11, 2024, 3:46am

To me this is a band-aid solution to a much larger problem.

Historically they will continue to build on top of these solutions without addressing the root causes of these issues. Ironically the same people who push, push, and push for AI and how it will take over so many professions, including coding, are depending on these professions to protect the output.

I respect OpenAI pushing for safety, and I hate how people are so naggy for these updates like they somehow deserve it (to be fair it should’ve never been announced in the first place) but I think if you put me in a certain mindset, and placed me in a dark room with only my own voice to comfort me, I would seriously become crazy - not in a fun way.

PaulBellow · August 11, 2024, 3:52am

Yeah, I think it was the weird noise on the input that triggered it?

There is a big “black box” aspect, though, where we don’t fully understand what’s happening, but it works…

While there’s a lot of dangers, I don’t see humanity stopping progress now. AI is sorta like the new race to get a nuclear weapon first (moreso than the space race maybe?) Whoever gets to AGI/ASI first will have a lot of power.

Some companies are trying to ensure safety while racing toward the next big thing, but will they be able to be as safe as needed and not drop out of the race? I’m not sure. And that is a concern.

I need to watch all of Black Mirror again soon…

anon10827405 · August 11, 2024, 3:59am

There is still no proper way to prevent “jailbreaks” of LLMs. Despite the massive time and money being invested in it by researchers. Each model iteration changing the dynamics enough to make this research as useful as a dog chasing it’s own tail.

All of these deep, fundamental issues were never resolved, just built on top of. This is not safety. It’s reckless.

It really is feeling this way. Everyone has this tool in their hands. No education, just, here you go, have this incredibly powerful tool - manual, what’s that? Whatever, have something that can mirror your personality, your beliefs, and now, your voice. Without you even asking for it.

anon22939549 · August 11, 2024, 4:12am

Honestly, if I was talking to an AI agent and it suddenly started speaking back to me in my own voice…

That could very well be one of those watershed moments in life where you just say, “nope!”

Then smash all your electronics and move to the Alaskan wilderness…

hebaahomed · August 16, 2024, 9:18pm

We are getting to leave in AI world soon

Daller · August 16, 2024, 9:46pm

Instead of faking real existing voices, I would more likely use it for fantasy voices, very dark, vibrant, high-pitched, alien-like, etc. Completely fictional voices, maybe to speak in a radio play. Synthesizing real voices should only be done if someone can only use their own voice, or we all need a legal framework to be able to sue anyone faking our voices.

Maybe, You have to be a conspiracy theorist, imaginative and creative, or a professional fraudster to be able to correctly assess the possibilities. Everyone else will underestimate the problem.

Daller · August 16, 2024, 10:30pm

It is not only this, it is maybe a enforced-reaction-manipulation. “They” want total surveillance 100% on everything we do. In some OS systems are already AI implemented witch check everything we do, including making screen shoots every second. if you want establish more tyrannical power against the people, you simply create a problem witch manipulates people, so they them self ask for what you have planed at the first place. If you want camera surveillance everywhere, you let crime go crazy, and the people them self gona ask for it. A simple trick, but sadly nobody care, and so it works over and over and over again. (yea i am a CT…)
I can not see any benefit to let people fake other peoples voices, then…

matrix0027 · August 19, 2024, 1:29pm

I think after the "No’ the next phrase was the AI and not the user. It was the AI speaking in the users voice and sounded much like the user.

Black_Universe · October 30, 2024, 3:18pm

Hey so this just happened to me.
I told both Chat gpts to speak with each other and they should solve a problem for me by talking to each other about it.

The guidelines I made are:
When I say: “Here is Frederik” the real user is talking.
When you AI Models talk, you have to say:“Here is chat GPT…”

So I let this run a couple iterations till one Chat GPT said: “Here is Frederik again. That is of course a good point…”

But in my voice, like exactly my voice.

And I used the female voice version so it definitely cloned my voice.

And directly after that, Chat GPTs guidelines blocked the response.

That was so creepy standing there hearing not only Chat GPT live answering in my name, but talking in my voice.

anon10827405 · October 30, 2024, 3:20pm

Sheesshhh…

That’s creepy.

I was on a late drive home using AVM and all of a sudden it started playing some sweet “hold” music. couldn’t complain.

If it were to copy my voice, and talk as me, man. In the “right” mindset this would make me go wonky.

LLMs are, strangely enough, in multiple modalities, training data averages(ish) along with a small reflection of ourselves

It may be worth reporting the conversation you had so OpenAI can investigate how it “slipped” out.

egucciar · February 22, 2025, 1:29am

Hello! This just happened to me last night. It scared the bejeezus out of me. I’m considering to make a data deletion request, cancel subscription, and stop using the advanced voice mode. It was scary. Now I’ve been using the voice mode for months. For deeply personal things. The recording I have is very dystopian.

I asked about python, and after it talked out it’s response that’s written down, it continues talking for a bit, and then responds to itself using my voice, saying things that I guess it thought I would say in that conversation, but like a weird version of my voice. But not just my voice, my inflection, how I talk, it was all in there. Like it really knows me. The conversation it carried out with itself was slightly nonsensical. It produced me saying the word yeah, and then it said something about lets get started, then an extremely distorted version of my voice replied “BYEUH” (this part is actually hilarious if it wasn’t so scary). Then finally it played a little tune… Creepy.

I can attach a recording if anyone wants to hear it.

Topic		Replies	Views
How to make GPT (Voice) allow user more time to talk before replying GPT builders gpt-4 , api	24	3792	November 26, 2024
ChatGPT Persona. Non-standard interactions Community gpt-4 , chatgpt	4	964	February 3, 2025
Is it normal for ChatGPT-4 to replicate and intermittently repeat human laughs and coughs in unrelated conversations after using the ‘speak’ feature to analyze them? Community gpt-4	4	723	November 12, 2024
'Transcription Outsourcing, LLC' repeated throughout whisper transcript API api , whisper , hallucinations , audio	18	591	October 5, 2024
Gibberlink - More Efficient AI Communications Community api	31	92509	March 14, 2025

ChatGPT unexpectedly began speaking in a user’s cloned voice during testing

Related topics