Audio Models in the API - live stream at 10 AM PT

sps · March 20, 2025, 4:12pm

Join Olivier Godement, Jeff Harris, Iaroslav Tverdoklhib, and Yi Shen as they unveil and demonstrate three novel audio models within the API—two speech-to-text models and one text-to-speech model—alongside an audio integration with the Agents SDK. This integration empowers developers to construct more intelligent and customizable voice agents.

While comments are disabled during the live video, feel free to engage in discussions here.

mat.eo · March 20, 2025, 5:31pm

Interesting.

So now we can provide vocal context to the TTS model

I’m somewhat surprised this wasn’t bundled into an SDK. Maybe an idea for someone to start? It would be nice to abstract this away, have a model create the parameters on the fly. Maybe this wouldn’t work? Cause too much inconsistencies in the voice?

I’m not sure what the the star icon means at the bottom left corner (didn’t watch), but those TTS models are so unbelievably impressive. Great work!

jim · March 20, 2025, 5:38pm

Curious this approach where a Python SDK gets first dibs on everything. Last week it was Traces, this week it’s audio in Responses—which I was surprised to just find out is NOT available…unless you’re using that SDK.

No offense, but I don’t want to build an app that is reliant on a kit that may or may not be updated. As a developer I’d just like access to the APIs so I can go on building what I know works for me.

vb · March 20, 2025, 7:08pm

Finally, we now know the purpose behind openai.fm . It’s not a music generation model, but rather a tool designed to test a TTS model capable of processing specific instructions regarding style. This has been frequently requested in the past.

sps · March 20, 2025, 7:20pm

aprendendo.next · March 20, 2025, 7:21pm

Wow I’m very excited to try these out!

I have been particularly anxious for improved TTS and STT from openai, because the costs are usually more reasonable than competition and it keeps getting lower in the long run, which makes it more accessible for more people (considering price x privacy policies x scalability).

These opens so many possibilities, now with web search and GPT 4.5 we can experiment in all sorts of interactive possibilities.

This has been an incredible year of deliveries from openai and we are still in march, this is going to be a very interesting year.

Thanks OpenAI team, keep up the exceptional work!

_j · March 20, 2025, 7:42pm

OpenAI has a different platform for developer engagement than the one infested by muskrats. Right here.

The “promotion” angle instead of the “communication” angle, and unidirectional flow instead of engagement (as this forum facilitates) is disappointing, as I will never open an “X” account required even to see “ordered-latest” (nor need a gadget).

I thought there was an unusual quality in the Santa voice not possessed by anything else. I wonder if that was such a gpt-4o-audio-tuning as presented today, Santa also robustly being also available in the ChatGPT TTS “speak aloud” button press instead of the bland quality of others.

The employed APIs are not exclusive to just that code. The offered code is just an accelerator, although for now it is also a replacement for documentation (tracing, etc). Think of it as an open-source API “app” you can’t fork. Copy its methods, reproduce its event handlers, lift its ideas, and apply them anywhere, on code you maintain.

jim · March 20, 2025, 8:22pm

I got the impression from the presentation today that these new audio functions were available with “just a few lines of code” in the AgentsSDK — which I assumed to be ResponsesAPI only —

But then I find out in the docs that audio is not available in the Responses API. Am I incorrect in this?

Also, I get that you could use the SDK as a sort of documentation, but I also got the impression from a conversation with one of the OAI devs on X that Traces was ONLY available through the SDK.

Happy to be wrong here.

vb · March 20, 2025, 8:39pm

There is an example in the cookbook where the new voice models are integrated into a pipeline with the agents.

I believe that’s what you are referring to.

356 · March 20, 2025, 3:32pm

I heard there is one and I’m wanting to know what is it about.︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀︀

ddrechsler · March 21, 2025, 12:54am

seems like .ogg removed from the list of audio files able to upload to the new
gpt-4o-transcribe? Is that on purpose or just an ommision?

jaromerohass · March 21, 2025, 7:30am

I’m having problems trying to give instructions like this example with Python:

response = client.audio.speech.create(
model=“gpt-4o-mini-tts”,
voice=“coral”,
input=“Today is not a wonderful day to build something people love!”,
instructions=“Speak in a cheerful and positive tone.”,
)
response.stream_to_file(speech_file_path)

I receive this error: TypeError: Speech.create() got an unexpected keyword argument ‘instructions’

Do I need to update some library?

sps · March 21, 2025, 11:43am

Welcome to the dev forum @jaromerohass

Yes, you’d need to upgrade to the latest version of the openai Python package.

You can do so using:

pip install --upgrade openai

jaromerohass · March 21, 2025, 8:31pm

Thank you very much!! I was struggling with something so simple!

jeffvpace · March 29, 2025, 6:28pm

We are in the process of testing GPT-4o-mini-tts using various instructions like those demonstrated at openai.fm. While the model performs well with relatively small text input, there are serious issues with larger text input as shown here: GPT-4o-mini-tts Issues: Volume Fluctuations, Silence, Repetition, Distortion There ARE use cases for larger text input. I’m hoping that someone from OpenAI will acknowlege these issues.

Topic		Replies	Views
Audio support in the Chat Completions API Announcements	13	4716	December 12, 2024
Did OpenAI just make a new AI Voice? API	7	2974	May 16, 2024
How can I get acess to the TTS models? API tts	17	3493	November 14, 2023
GPT-4o Audio Access for API API gpt-4o	28	33724	December 13, 2024
TTS API service usability API tts	17	6910	December 16, 2023

Audio Models in the API - live stream at 10 AM PT

Here’s a summary of what was announced in today’s livestream:

OpenAI is also hosting a contest on OpenAI.fm:

Newsroom release:

Related topics