For audios longer than 1.5-2mins, gpt-4o-mini-tts works really unstable: random pauses from few seconds for more than a minute, random volume and tone changes, repeating last few sentences in random order.
Observing it all the time with very different instructions and inputs. Any known workarounds?
I’ve noticed that too.
Breaking the text into smaller parts of up to 1 minute seems to make it more stable, if that’s a viable option for you.
Same here. Total requested audio was 4:31, but from 1:21-2:26 and 3.02-3:36 there was only silence. Also huge volume level changes and style shifts. In short: unusable crap.
These limitations should be clearly stated.
We rolled out a fix today for a class of issues that were causing excessive stretches of silence and repetitions in the output. Could you give it another try and let us know if it’s still happening?
It’s still pretty bad. When using 4o mini tts, the voice is coarse. But the older tts was a good smooth voice. Here’s an example of what I mean:
4o-mini-tts (bad) - 4o-mini-tts
older-tts (good) - old-tts-model
I have used the exact same voice onyx. And given no instructions. The only change was the model Id