Response audio suddenly cuts off when using "gpt-4o-audio-preview-2024-12-17"

Hi,

I’m using the “gpt-4o-audio-preview-2024-12-17” model via the API to read German text in German.
The text is given inside the user prompt and is wrapped inside “speak” XML tags.

This works really well.

The only problem that I encounter is that sometimes, the audio suddenly cuts off before the whole text is read.
The audio usually cuts off in the middle of a word, so it doesn’t seem likely that the model simply thought it already finished the text.

Also, the transcript that is returned contains the whole text. So the model clearly understood it was supposed to return the whole text in the audio.

Also, the value of “finish_reason” is always “stop”, regardless of whether the whole text was spoken or the audio cut off.

Is this a known bug in the preview model?
Is there something I can do to prevent this?
Alternatively, can anyone think of reliable and automatic way to find out if the audio cut off too early before the text is finished? In that case, I could simply retry, as that usually works in my tests. In production, it won’t be possible to check each audio manually.

Are you using a speaker setup?
Sometimes the AI hears itself, thinking someone spoke to it, interrupting itself.
Check for any interruption events that might be cancelling the audio playback.

Cheers! :hugs:

Thanks for your response.
I’m using the API as indicated in the tags attached to this topic. There is no way for the model to hear itself in realtime from my speakers.

try to use the recomended voice: ash , ballad , coral , sage , and verse
and use audio id as message history if you want to create multiturn conversation