Updates to building agents: Typescript Agents SDK, a new RealtimeAgent feature for voice agents, Traces for Realtime, and speech-to-speech improvements

pbakkum · June 3, 2025, 6:08pm

The Agents SDK is now available in TypeScript and supports handoffs, guardrails, tracing, MCP, and other core agent primitives, just like the Python version. It includes new support for human-in-the-loop approvals, allowing you to pause tool execution, serialize and store the agent state, approve or reject specific calls, and resume the agent run.

You can also build voice agents that run in the client or on your server with the new RealtimeAgent feature, powered by the Realtime API. Define them like text agents, including tool calls, handoffs, guardrails and with automatic audio and interruption handling.

Next, the Traces dashboard now supports Realtime API sessions, letting you visualize voice agent runs, including audio input/output, tool invocations, and interruptions, whether created via the API or the Agents SDK.

Finally, we’re improving the instruction following reliability, tool calling consistency, and interruption behavior of our speech-to-speech model, and introducing a new speed parameter in the API that lets you control how fast the voice speaks during each session. The updated model is now available as gpt-4o-realtime-preview-2025-06-03 in the Realtime API and gpt-4o-audio-preview-2025-06-03 in the Chat Completions API.

Hope these updates help you build even more useful voice agents! Please keep the feedback coming — we’re continuing to make more improvements to the Agents SDK and Realtime API.

sergeliatko · June 3, 2025, 6:41pm

Awesome as always! Thanks for the update.

OnceAndTwice · June 3, 2025, 8:33pm

OpenAI may not be a restaurant… but never fails to serve.

Eran_Peer · June 4, 2025, 10:37am

It seems to me that the latency is much higher for function calling than before with this new model, which would make it potentially worse for AI agents.

user089 · June 4, 2025, 2:06pm

Not only is the function calling latency higher, the realtime audio playback also contains more jitters / static. Seems like overall worse performance than gpt-4o-realtime-preview-2024-12-17.

will15 · June 4, 2025, 3:35pm

I’m interest to learn a little more about “Traces” after reading through your docs. First, is it free? I know of course that I’ll need to pay for input/output tokens to the models, but is there any additional charge for the Traces specifically?

Second: Is the Agents SDK the only way to access Traces? Or are they also available through the Responses and Completions APIs?

Shane_Combest · June 4, 2025, 7:17pm

Love this new update for the realtime api, but having a few issues:

Tried to enable tracing but it is only showing the audio input and not the AI response output. I saw on X the video demo showed the input and output but I can’t see it.
It appears there is a LOT more jitter in the audio stream sometimes. not sure if that is due to webrtc or something else. This is the main reason we want to be able to view the output trace.

Is anyone else experiencing the lack of output tracing or audio jitters?

Daniel_Gulino · June 4, 2025, 7:36pm

I’m also experiencing terrible jitters in the audio that make this version unusable in its current state.

Hopefully, there is a fix for this because I was really looking forward to being able to use Traces.

Ana74 · June 4, 2025, 8:59pm

This is the most exciting updates so far! Looking forward to trying voice agents- especially real-time conversation and human-in-the-loop control. Amazing work!

kavitatipnis · June 4, 2025, 9:07pm

Can we figure out deployment of agents ? I have doc processing agents - modern day version of airflow/spark jobs but I execute them as github actions so my logs/tracing & compute is in 2 places. For example, if I have nightly workflows , I cannot program that using Agents SDK , I have to do that in a cron job scheduler type of software.

m.xingster · June 5, 2025, 1:20am

+1 on the audio jitters and static, the realisticness/emotion also seems worse - is this just because you now have to be more specific in the instructions?

edwinarbus · June 5, 2025, 1:44am

@m.xingster @Daniel_Gulino @user089 @Shane_Combest Could you try again now? We made some improvements about 2.5 hours ago.

Bryan_Houlton · June 5, 2025, 1:51am

Very cool! Any updates on whether or not you’re going to build realtime models with reasoning capabilities? We’re really loving Gemini 2.5 Live (using a thinking budget) but the tool calling is not good. We’d love to see a similar product from OpenAI, since non-reasoning models cannot fit our use case very well.

m.xingster · June 5, 2025, 2:16am

Just tried it, I’m still getting a continuous stream of static in the audio background

revitour · June 5, 2025, 2:38am

Thank you for the upgrades on all fronts here. One next ask would be more voice profiles available, perhaps expanding to the fuller set available in the TTS model and beyond? That or being able to offer more voice library portability / cloning in the audio to audio models. Thanks!

LarrysWorkbench · June 5, 2025, 3:21am

My robots are using realtime for conversation and it would be awesome for them to aquire vision and structured outputs

pedroavex · June 5, 2025, 5:03am

Yes, great, but in my opinion the realtime API is quite left behind, probbaly because it is not a very important product for openAI ($$). However, be aware that its users are getting disappointed as times goes by. What about 4o-mini, which has a less bitter price? Will it be updated too? Thanks.

Mr_Berta · June 5, 2025, 5:26am

The only update I am interested in at the moment is support for multiple streams with their own context. Meaning, I would like to create a voice bot that can join a realtime audio conversation with multiple people and be able to differentiate between each, who is who and the context provided for each. And ultimately, only respond when needed.

Sodiq_Ayilara · June 5, 2025, 5:32am

Made a few tests, the instruction following has definitely improved for my use case.
When is this coming to mini? We dont really need 4o intelligence and cost.
We are compelled to use this pin for now.

Topic		Replies	Views
TTS API service usability API tts	17	7192	December 16, 2023
Realtime API updates — WebRTC, cheaper prices, 4o-mini, and more Announcements	26	8255	December 29, 2024
Introducing the Realtime API Announcements	28	8795	January 16, 2025
Realtime API nerfed vs Advanced Voice Mode? Feedback realtime	10	2505	February 11, 2025
Realtime API extremely expensive Feedback realtime	66	7870	December 4, 2024

Updates to building agents: Typescript Agents SDK, a new RealtimeAgent feature for voice agents, Traces for Realtime, and speech-to-speech improvements

Related topics