GPT-Realtime 1.5 — Major regression in voice expressiveness and accent quality

I’m building an AI voice training tool for reception training. GPT-realtime roleplays as virtual hotel guests with distinct accents and personalities, and the user can practice real scenarios with. With gpt-realtime v1, I had reached a point where the voices conveyed strong emotions and convincing accents. It felt genuinely realistic.

Tested gpt-realtime-1.5 and it’s a clear downgrade:

  • Accents are almost entirely gone. In v1, the model could deliver speech with convincing regional accents that made conversations feel authentic. In 1.5, this is essentially stripped out.
  • The voice sounds noticeably more robotic. Whatever was making v1 feel natural and human-like has been lost. The output in 1.5 feels flat and synthetic in comparison.
  • Emotion is barely there. v1 had genuine warmth, inflection, and emotional range in its delivery. 1.5 sounds like it’s reading a script with no feeling behind it.

I’ve seen the benchmarks OpenAI published: +5% audio reasoning, +10% alphanumeric transcription, +7% instruction following, but none of that matters for our use case if the voice itself sounds worse. It seems that lately, OpenAI has been chasing the benchmarks above all else, even when it’s been proven time and time again that they do not drive adoption or real-world use.

Even so, instruction following and tool calling improvements are great for enterprise agent workflows, but not at the cost of the qualities that made realtime voice compelling in the first place.

For now, we are staying on gpt-realtime (v1) and will continue to monitor updates.

Is anyone else seeing the same thing? I’d like to know whether this is being tracked internally or if there are plans to bring back the expressiveness that v1 had in 1.5.

8 Likes

Hello, yes I have noticed the same thing. In my scenario the 1.5 model changes the pitch and speed of the agents voice for seemingly no reason. This doesn’t happen on the standard realtime model, and I have not been able to address these issues through prompting. I agree the model is a clear downgrade at the present time, I am sticking to gpt-realtime until the 1.5 issues are resolved. There doesn’t appear to have been any acknowledgement from OpenAI however?

1 Like

I also agree. 1.5 voice lacks expressiveness compared to previous model on the same instructions. Voice responses seem like its coming from a disaffected AI bored with its job.

2 Likes

@OpenAI_Support Can you do something about this?

2 Likes