How do I get a wider range of emotion out of tts like shown in the demo today?

I have used OpenAI’s tts API. The output produced is good but not like what was demoed today. Is the API behind the demoed chatgpt update coming to the public?

Is the audio output shown in the demos today from gpt4o directly or is there some model doing text to speech?

It is directly gpt-4o, but that capability of the model is still being red-teamed.

As of today, the gpt-4o model is a text + vision model only.