About two days ago, I noticed a slowdown in the gpt-4o-mini-tts voice, which generates and returns a natural AI voice via POST requests. Until recently, the voice was natural, without long silent pauses. Currently, the voice is noticeably slower, and these pauses after each word create an unnatural sound and articulation. Unfortunately, adjusting the TTS speed parameter does not resolve the issue — it only digitally speeds up or slows down the voice, which further damages its authenticity. I have tried various voice options, but none of them solved the problem with the slowed-down voice.
Could you please fix the current version of gpt-4o-mini-tts so that the voice sounds more natural again?
The new model snapshot clearly improves word error rates. I shared evaluation results in the topic linked below and can confirm it from my own testing. It would be perfect if it were equally steerable.