Gpt-realtime-2 voice is extremely slow (speaking slowly)

Hello,

We upgraded to gpt-realtime-2 model but the voice has an extremely slow pace of speech (probably the half of what it should be). The speed param is set at 1.1 so it is not the origin of the issue.

How to solve this issue ?

Nota : I use gpt-realtime-2 with Twilio ; I use format: { type: “audio/pcmu” } in “session.update”. If it can be the origin of the issue ; but I doubt that it is the problem as it seems that the voice speed is not always the same (sometimes nearly ok, sometimes very slow)

Thanks.

100% agree. In our experience, it sounds very slow, almost as if it needs time to think before saying each word. We’ve tried improving smoothness through prompting and saw marginal gains in some calls, but not consistently.

So far, compared to gpt-realtime-1.5, it feels like a clear downgrade in conversational flow and pacing (at least in Spanish and Catalan; for English it’s less evident).

Agree with this too. The problem seems to be worse in non-English languages, with feedback it is slower and the French accent is really bad compared to gpt-realtime-1.5. Some improvements were possible through prompting to make it sound more like French from France, but it still sounds a bit like a German speaking English at points throughout the call.

The model as a whole seems a big step forward in terms of following steering and tool calls, but in terms of audio satisfaction, it is worse in many dimensions. At least it doesn’t occasionally change from female to male voice mid conversation like gpt-realtime-1.5. Two steps forward, one and a half steps back. The lack of tone/personality/speed consistency, especially when regressing between releases is making us question the viability of this model vs STT → LLM → TTS

Actually my issue was that I said in the prompt “speak slowly” :slight_smile: Problem solved for me.

Hilarious :laughing: Well at least it works for you