For audios longer than 1.5-2mins, gpt-4o-mini-tts works really unstable: random pauses from few seconds for more than a minute, random volume and tone changes, repeating last few sentences in random order.
Observing it all the time with very different instructions and inputs. Any known workarounds?
I’ve noticed that too.
Breaking the text into smaller parts of up to 1 minute seems to make it more stable, if that’s a viable option for you.
Same here. Total requested audio was 4:31, but from 1:21-2:26 and 3.02-3:36 there was only silence. Also huge volume level changes and style shifts. In short: unusable crap.
These limitations should be clearly stated.
We rolled out a fix today for a class of issues that were causing excessive stretches of silence and repetitions in the output. Could you give it another try and let us know if it’s still happening?
It’s still pretty bad. When using 4o mini tts, the voice is coarse. But the older tts was a good smooth voice. Here’s an example of what I mean:
4o-mini-tts (bad) - 4o-mini-tts
older-tts (good) - old-tts-model
I have used the exact same voice onyx. And given no instructions. The only change was the model Id
this issue is still happening in Oct.
I have tried different voices, response formats(mp3, wav, flac etc) , stream formats (sse vs audio)
i have also reduced audio length to 30-45 seconds.
the issue still persists. Any help is appreciated here.
Still unusable in December.
I used ‘alloy’
Had to chunk my script up into two. The first sounded completely like a male. The other sounded like alloy should, but slowly turned darker and darker, at which point artifacts began to appear too.