Gpt-4o-mini-tts too inconsistant, can we get a seed/ID back?

Every request to Gpt-4o-mini-tts with the same prompt & instructions returns a slightly different audio output. This makes it impossible to use at scale.

If it were possible to create the voice you wanted, get an ID/seed for that voice so we can reuse it, that would solve the issue and I would happily migrate from using tts-1.

Hi lilk4boom2 :waving_hand:

Yes, I fully agree with you — the responses from gpt-4o-mini-tts (and even gpt-4o-mini for text) are currently too inconsistent to rely on for high-stakes production use.

Many users in the community have already pointed out the same issue. Really hoping this gets addressed soon.

A seed does not ensure the same voice or style. It only could ensure the same random sampling is done. You are changing the inputs and thus getting different distributions of logits as output, so seed would not matter.

Instead, what you would need to request is that the speech endpoint have a temperature and top-p parameter.

However, for these audio models, when you do have the full range of temperature on Chat Completions, you will find that a low or zero temperature, something about the generated audio quickly goes wrong. Something about the codec and audio itself being highly patterned, it can go into loops that do not generate anything to be heard. Realtime endpoint thus limits the temperature range, and it is not presented at all as a new feature when gpt-4o models were put on the speech endpoint.

1 Like