Gpt-4o-mini-tts produces unusable results

For audios longer than 1.5-2mins, gpt-4o-mini-tts works really unstable: random pauses from few seconds for more than a minute, random volume and tone changes, repeating last few sentences in random order.
Observing it all the time with very different instructions and inputs. Any known workarounds?

3 Likes

I’ve noticed that too.
Breaking the text into smaller parts of up to 1 minute seems to make it more stable, if that’s a viable option for you.

2 Likes

Same here. Total requested audio was 4:31, but from 1:21-2:26 and 3.02-3:36 there was only silence. Also huge volume level changes and style shifts. In short: unusable crap.
These limitations should be clearly stated.

1 Like

We rolled out a fix today for a class of issues that were causing excessive stretches of silence and repetitions in the output. Could you give it another try and let us know if it’s still happening?

1 Like

It’s still pretty bad. When using 4o mini tts, the voice is coarse. But the older tts was a good smooth voice. Here’s an example of what I mean:
4o-mini-tts (bad) - 4o-mini-tts
older-tts (good) - old-tts-model

I have used the exact same voice onyx. And given no instructions. The only change was the model Id