Getting TTS hallucinations with long inputs in 40-mini

When I use 40-mini TTS, I’m consistently getting errors/hallucinations in the output.

I using relatively large text inputs to create an audio output (for a story), and the result is pretty clean for maybe the first 70%, but I’ll then get weird long gaps in the audio, or in some cases short gaps where it then fills in with hallucinated audio.

Has anybody else seen this issue with TTS? Any tips on eliminating these artifacts, my outputs are unusable because of it.

1 Like

just break it down with a script if that’s your issue, produce shorter audio files instead of going for longer ones… also, in my own personal experience using 4o tts, i see no hallucinations on really large text that are just shy of hitting the token limit