/audio/speech: truncated audio for some single word strings

,

EXPECTED BEHAVIOUR

Input:
A one-word string like “Research” or “Academia”.

Output:
An audio narration of the word.

ACTUAL BEHAVIOUR

Output:
The audio narration sometimes omits part of the word. E.g. “earch” instead of “Research” and “demia” instead of “Academia”.

(I would share example MP3s but the forum won’t let me, perhaps because I just registered.)

STEPS TO REPRODUCE

  1. Use the example code from the OpenAI docs.
  2. Create a narration for a word like “Research”.

N.B. This bug does not happen on every request. I found that if I request the word “Research” or “Academia” 10 times I would see the bug at least once.

A quick fix: put a period and space as the first characters of your request.

Then investigate: is the audio really clipped at the beginning, or is it a problem with the playback application not waiting until devices are ready. Open your audio file in an audio editor and insert silence at the beginning to see if you got what was requested and it is just not rendering properly.

Improvement when requesting flac?

Thanks for the quick reply.

I’ll try the period and space trick and report back.

Meantime:

  1. I tried with MP3 and Opus format, but I’ve not tried FLAC.
  2. It’s not the audio player—if I open the MP3s in Audacity I can see the full waveform and it’s missing the start of the word.

Here are the sample MP3s (you’ll have to remove spaces from the link):

https:// files.type3.audio /samples/2023-11/open-ai-short-word-bugs/mp3/nova-1.mp3

https:// files.type3.audio /samples/2023-11/open-ai-short-word-bugs/mp3/nova-3.mp3

I just ran 20 iterations looping through all the voices, trying the period. Confirmed.

The Nova and Shimmer voice is most affected by this anomaly. Others are affected also, even one silence and one that sounds like “rurrr”.

And just as quickly came up with a fix with not a single foible.


prompt = """
[pause]
Academia
"""
1 Like

Ah, good to see others are running into this issue too… I’ve managed to see it trip up on the string “Alien.” using Onyx

@_j Thank you for the help on this.

My tests, like yours, suggest that this workaround is quite reliable (I’ve not seen it fail, yet). That said, I would not be surprised if it fails in some cases: in other contexts I’ve noticed that the “[pause]” string usually adds a pause—but not always.

I hope OpenAI will release a proper fix for this bug.

1 Like