TTS models returning blank audio and repetitions

I’ve been using gpt-4o-mini-tts to build a tool to read documents for me. However, sometimes the output is quite long, containing the input text several times, separated by long empty audio. For example, I submit a text like “Once upon a time there was a cat.” would generate a long audio like “Once upon a time there was a cat[BLANK for 20 seconds]Once upon a time there was a cat[BLANK for 17 seconds]Once upon a time there was a cat”.

Is anyone else experiencing the same problem? If so, is there a way around this?

I doubt this is useful since it’s pretty standard, but here’s the code making the request:

    const mp3 = await openai.audio.speech.create({
      model: "gpt-4o-mini-tts",
      voice,
      input: text,
      instructions,
    });

Thank you all in advance.

1 Like

Yeah, it just happened to me too, after a little less than 1k input characters the audio stopped, then I thought it went wrong and was truncated, but then it continued after a very long period of silence.

I also noticed the audio tokens charged were a bit high, I don’t know if it was charging for the silence, but I didn’t have time to replicate what happened as it was in a compiled app that didn’t save logs. It charged like 15k audio tokens for about 2k tokens of text.

1 Like