Proposed explanation for GPT-4o copying user's voice

Kaltovar · August 10, 2024, 9:05am

I’ve seen this suggestion proposed by several people around the internet now and it was insightful enough I thought I’d share it here.

Because GPT-4o’s “real-time conversation” variant tokenizes audio directly instead of using speech to text, it’s probably doing the same thing old versions of GPT used to do when it would hallucinate and start writing the user’s expected responses on their behalf.

Most of us who’ve been around remember at least one time long ago when we’d roleplay with GPT and it would start writing out the user’s reactions. Well, if audio is just a token, then that’s probably all it’s doing. It’s predicting the next likely tokens in the conversation. Those tokens just happen to sound to us like our voice because it is hallucinating and believes it’s our turn in the conversation and is “writing” our tokens out for us.

Pretty cool hypothesis, no?

Topic		Replies	Views
ChatGPT unexpectedly began speaking in a user’s cloned voice during testing Community in-the-news	27	11454	February 22, 2025
Interesting System Prompt Leakage from O1 (non-preview) Prompting chatgpt	0	729	December 6, 2024
Is it possible for the realtime api to copy speech patterns? Plugins / Actions builders gpt-4 , chatgpt , api	1	193	December 9, 2024
Did OpenAI just make a new AI Voice? API	7	3444	May 16, 2024
GPT Api simulates conversation with itself instead of talking with user Prompting prompt	13	2696	October 13, 2023

Proposed explanation for GPT-4o copying user's voice

Related topics