The output audio does not fully match the output text; it ends early

Does anyone have the same issue?

In realtime API, the output audio does not fully match the output text; it ends early and only speak the part of output text.

Yes I’ve noticed this as well. Audio output getting cut off is pretty common. And yes, the transcript quality is also horrendous when compared to Deepgram or Google. Hopefully they come out with a whisper 2 or something soon.

This happened to me too… It made me wonder if the audio I sent over is somehow broken because the transcription of my input is wrong too frequently… OpenAI’s audio response playback also has a clicking noise…