Chat gpt 4o TTS API lacking details

I read https://platform.openai.com/docs/guides/text-to-speech
It says it support languages, but example does not show how to use other languages…

has anyone being able to make it wok?

It supports outputting the spoken languages listed but has no language parameter yet. So you can input texts from those languages and it should generate the corresponding audio. However, please take note that for languages other than English, it might sound as if the texts are being spoken by a foreigner.

2 Likes

BTW, so far no docs is saying that TTS is GPT4o. Since GPT4o supports 3 modality input and output, assuming that they’re not achieving such by simply piping text output into TTS, then TTS is not gpt4o for the obvious lack of modalities.

1 Like

It detect the language based on the text itself. This is a problem because they are languages that uses that same word but speak differently like Malaysian and Indonesian. Aside from that, it won’t be able to detect the language reliable if the text is short. You can try improving it’s accuracy by using something like this {language}: . The modal will only speak in <> bracket. I think you have better luck using elevenslab.

This capability is not yet publicly released.

Thanks for the clarification. Any ETA on this one?

When they’re convinced it’s safe to be released.

That’s not necessarily “safe” as in unlikely-to-create-Skynet-safe as much as won’t-inexplicably-start-shouting-racial-epithets-safe (I’m not suggesting this is what it’s currently doing or why it’s not released, this is a completely fabricated example of something they probably don’t want the model doing.)

They’re currently red-teaming the model. No one outside of OpenAI (and probably select partners) is privy to the current status of that process.

All I know is we’ve been told the audio-to-audio capability is the top priority and they’re working hard to bring it to all of us as quickly as possible.

1 Like