Gpt-4o-mini-tts too inconsistant, can we get a seed/ID back?

A seed does not ensure the same voice or style. It only could ensure the same random sampling is done. You are changing the inputs and thus getting different distributions of logits as output, so seed would not matter.

Instead, what you would need to request is that the speech endpoint have a temperature and top-p parameter.

However, for these audio models, when you do have the full range of temperature on Chat Completions, you will find that a low or zero temperature, something about the generated audio quickly goes wrong. Something about the codec and audio itself being highly patterned, it can go into loops that do not generate anything to be heard. Realtime endpoint thus limits the temperature range, and it is not presented at all as a new feature when gpt-4o models were put on the speech endpoint.