GPT-4o Chat Completion with audio response

Hello fellows, I have a system that basically gets the cha completion response, send it back to the api to get the tts response. As you can imagine, the latency here is mind blowing.

Would be possible with the new model, get an audio as response from Chat Completion instead text?

So, when we gonna have it?


For that OpenAI has a separate model:, you can feed the response generated from Chat Completion to TTS to convert it in an audio.

Not sure if there’s a timeline/roadmap for Chat Completion audio responses.

This solution is the one that I’m using but it’s not productive because the resposiveness goes to the ground.

The user awaits more than 10 seconds to get one response

Hi, already using tts-1 model for this also.

Before even having direct audio response, it would be great to have voice with emotions as we saw in the demo -and perhaps also the new female voice used.

But as we can see in doc, there are no new models, only tts-1 & tts-hd with the usual voices.

So sad, because the presentation about the 4o looks like an amazing new things with good audio response and better responsiveness, but…


I agree. Moreover seems it won’t come for “normal” users soon :

Developers can also now access GPT-4o in the API as a text and vision model. GPT-4o is 2x faster, half the price, and has 5x higher rate limits compared to GPT-4 Turbo. We plan to launch support for GPT-4o’s new audio and video capabilities to a small group of trusted partners in the API in the coming weeks.

1 Like

you could solve this problem by breaking down the problem into two little parts…