Introducing gpt-realtime and Realtime API updates for production voice agents

edwinarbus · August 28, 2025, 5:57pm

The Realtime API is officially out of beta and ready for your production voice agents!

We’re also introducing gpt-realtime—our most advanced speech-to-speech model yet—plus new voices and API capabilities:

Remote MCPs
Image input
SIP phone calling
Reusable prompts

gpt-realtime was trained with customers to excel at real-world tasks like support, personal assistance, and education. It’s better at:

• Following instructions
• Calling tools
• Natural, expressive speech
• Understanding cues (like laughs)
• Switching languages

We’ve added two API voices, Cedar and Marin, and improved voice quality overall for more humanlike speech that can adapt to tone.

And we’ve cut pricing by 20%.

Prompting guide on speed, tone, hand-offs, and more: Realtime Prompting Guide

Docs: https://platform.openai.com/docs/guides/realtime

_j · August 28, 2025, 6:22pm

Note that the livestream demo used a saved prompt ID, thus the empty system instructions box that was shown. (I just got a default system message pushed with a hard refresh.)

The gpt-realtime model page is a bit wrong if image inputs for vision is allowed modality. (fixed)

To attach an image in the playground, you have to press the little keyboard icon, to get a “send a text message” or then to have a paperclip icon to launch a image-only file browser.

I also get voice “marin” voice not available for organization. (fixed)

The playground is a nonstop loop of the AI hearing and interrupting itself, regardless of any VAD threshold, semantic “eagerness”, or noise reduction, even turning speakers 3 feet away from the mic way down low. I’d like an explanation of how it was done without everybody wearing headphones in a quiet room. The easy API advice as given before: without hardware echo cancellation, auto-mute the mic when audio is being produced.

(Same AI symptom shown in the livestream as GPT-5 already produces: offering unstoppable followup promises the API configuration cannot fulfill - “would you like me to make a new image” there, like other solicitations to generate unobtainable deliverables.)

david_hallgren · August 28, 2025, 6:46pm

Thanks, great to see this and look forward to try it in my app today! Couldn’t see any information about when existing 4o-realtime and 4o-mini-realtime will be closed down, any more information on that?

vb · August 28, 2025, 6:47pm

Thank you, I couldn’t find it myself

Put differently, the option to attach images appears after the session has started.

aprendendo.next · August 28, 2025, 7:00pm

Great news! Also, it seems the 2 new voices are available for gpt-4o-mini-tts too:

We’re releasing two new voices in the API, Marin and Cedar, with the most significant improvements to natural-sounding speech. We’re also updating our existing eight voices to benefit from these improvements.

Tom_Scrace · August 28, 2025, 7:22pm

Has anybody migrated yet from the old model? Is it just a case of switching in the new model identifier? Or more to it?

multitechvisions · August 28, 2025, 7:55pm

Just did a test, was as simple as switching the model selector to gpt-realtime

jeffvpace · August 28, 2025, 9:39pm

Great news! Also, it seems the 2 new voices are available for gpt-4o-mini-tts too

Nice!

Where did you see this? Is there a link?

TIA

Edit: Nevermind - found it in the playground

office.gs · August 29, 2025, 5:52am

I have been waiting a long time for OpenAI to add this feature so that I can practice speaking a foreign language according to the lesson plan and default prompt. Currently, my interface does not allow me to add files, and the paperclip icon is missing. Please help.

_j · August 29, 2025, 6:27am

OpenAI doesn’t list gatekeeping vision inputs on this model behind “ID verification”, although the help page two weeks old is generally misinformation, with indirect language “subject to id verification” instead of “you will be denied, don’t prepay us”, not mentioning o3 blocks, other model blocks such as o1 by undisclosed “trust” or higher tier requirements, and other streaming blocks.

Thus, I must only suggest: hard-refreshing the page: shift+refresh, or cmd+refresh, whatever os/browser key combo does that for you. Then beyond that, a logout, browser close, and restart. Try a newly-created project and select it in the upper-left drop-down. Then the most extreme, that will cause loss of all playground history: delete browser local DOM storage for the site, cache and cookies.

vb · August 29, 2025, 6:33am

I don’t know what’s hidden behind the arrow down symbol in the message box in your screenshot. You most likely already tried this but that’s the place where I would look first.

A verified organization is not required to send images to GPT-5 realtime. I have confirmed this already. In general the features are available in the UI but trying to use them will trigger some type of notification that verification is required when using the playground.

office.gs · August 29, 2025, 11:59am

Another browser also lacks the ability to add a file.

playground · August 29, 2025, 5:34pm

hey everyone, excited to try the new gpt-realtime model. Struggling to implement at the moment and appreciate quick help, as documentation still seems to reference past 4o models or preview. TLDR, I’m getting an odd “model_not_found: The model “gpt-realtime” does not exist or you do not have access to it.” response, but now that it’s in GA, I’m not sure why. Also, it’s not in my project limits list of models, so I’m unable to “allow” it.

any solutions or suggestions? or something I”m doing wrong? thx!

_j · August 29, 2025, 7:09pm

Sounds like you are using a project’s allow list.

I’d make a new project and key within just for this model and endpoint, and not set any limits on it; just use the defaults. See if that yields success, a fix that has remedied similar issues with new model deployments.

playground · August 29, 2025, 7:29pm

Interesting thanks will try. I’m trying not to use an allow list, just had default settings, was only checking allow list after someone else suggested checking if my project had access to the model yet, and I didn’t see it in the list.

But I’ll try an entirely new project and revert back if any problems. Thanks!

Keremk · August 30, 2025, 9:26am

I am trying to upgrade my app to the new @openai/agents sdk version 0.1.0 and also to the new model. But it seems like there is a new SDP handshake endpoint v1/realtime/calls which I cannot see any documentation for. So my handshake fails unless I intercept the call and add a header headers.set(“OpenAI-Beta”, “realtime=v1”). I could have never found this myself but Codex CLI with GPT5-high enabled figured this out and said this is because projects are being rolled out and mine is not and I need this. If this is true (which seems so because only way I can connect is by adding this header), this is a very substandard developer experience. Or am I missing something here. Can someone from the OpenAI team shed some light?

whisper1 · August 30, 2025, 2:47pm

Is there any documentation for the SIP protocol endpoints?
for example the /accept/ endpoint from the python code example that you provided in the url:

https://platform.openai.com/docs/guides/realtime-sip

it seems like it has (only?) three keys in the root of the Json in the post body which are:

{
    "type": "realtime",
    "instructions": "You are a support agent.",
    "model": "gpt-4o-realtime-preview-2024-12-17",
}

Looked everywhere and couldn’t find any reference / docs to how this json object should look like. are there other parameters like temperature, voice, turn detection and basically every parameters that is in the playground’s UI.

Thanks!

multitechvisions · August 30, 2025, 3:35pm

I think the call_accept object from the python example is the standard request body for starting a session.

This webhook lets you accept or reject the call. When accepting the call, you’ll provide the configuration (instructions, voice, etc) for the Realtime API session.

This sentence is what led me to believe this; then when you compare the request body parameters with what’s listed in the example:

From the python example

From the realtime api documentation

Notice that the “session” object in the documentation is basically the same thing as the call accept object from the python example.

I’m betting you can provide anything from the request body.

whisper1 · August 30, 2025, 4:35pm

We should’ve made that bet.
I’ve already tried passing keys from the session object (any proper doc of it? I only found code examples in openAI’s docs). I tried:
.temperature
.voice
.modalities
.prompt

each of them (by itself) failed the initiation of the call. When taking it off, it worked again.
BTW: any way to debug this and see logs?

Topic		Replies	Views
New audio models in the API + tools for voice agents Announcements	28	5967	July 13, 2025
The "sorry" state of the Realtime SDK Documentation api-realtime	19	1079	September 18, 2025
AMA on the 17th of December with OpenAI's API Team: Post Your Questions Here Community api , ama , shipmas	193	6905	December 17, 2024
How to get access to gpt-4-vision-preview? API	39	37465	February 6, 2024
Launching o3-mini in the API Announcements	61	25433	February 10, 2025

Introducing gpt-realtime and Realtime API updates for production voice agents

I’m betting you can provide anything from the request body.

Related topics