The Agents SDK is now available in TypeScript and supports handoffs, guardrails, tracing, MCP, and other core agent primitives, just like the Python version. It includes new support for human-in-the-loop approvals, allowing you to pause tool execution, serialize and store the agent state, approve or reject specific calls, and resume the agent run.
You can also build voice agents that run in the client or on your server with the new RealtimeAgent feature, powered by the Realtime API. Define them like text agents, including tool calls, handoffs, guardrails and with automatic audio and interruption handling.
Next, the Traces dashboard now supports Realtime API sessions, letting you visualize voice agent runs, including audio input/output, tool invocations, and interruptions, whether created via the API or the Agents SDK.
Finally, we’re improving the instruction following reliability, tool calling consistency, and interruption behavior of our speech-to-speech model, and introducing a new speed parameter in the API that lets you control how fast the voice speaks during each session. The updated model is now available as gpt-4o-realtime-preview-2025-06-03 in the Realtime API and gpt-4o-audio-preview-2025-06-03 in the Chat Completions API.
Hope these updates help you build even more useful voice agents! Please keep the feedback coming — we’re continuing to make more improvements to the Agents SDK and Realtime API.
It seems to me that the latency is much higher for function calling than before with this new model, which would make it potentially worse for AI agents.
Not only is the function calling latency higher, the realtime audio playback also contains more jitters / static. Seems like overall worse performance than gpt-4o-realtime-preview-2024-12-17.
I’m interest to learn a little more about “Traces” after reading through your docs. First, is it free? I know of course that I’ll need to pay for input/output tokens to the models, but is there any additional charge for the Traces specifically?
Second: Is the Agents SDK the only way to access Traces? Or are they also available through the Responses and Completions APIs?
Love this new update for the realtime api, but having a few issues:
Tried to enable tracing but it is only showing the audio input and not the AI response output. I saw on X the video demo showed the input and output but I can’t see it.
It appears there is a LOT more jitter in the audio stream sometimes. not sure if that is due to webrtc or something else. This is the main reason we want to be able to view the output trace.
Is anyone else experiencing the lack of output tracing or audio jitters?
This is the most exciting updates so far! Looking forward to trying voice agents- especially real-time conversation and human-in-the-loop control. Amazing work!
Can we figure out deployment of agents ? I have doc processing agents - modern day version of airflow/spark jobs but I execute them as github actions so my logs/tracing & compute is in 2 places. For example, if I have nightly workflows , I cannot program that using Agents SDK , I have to do that in a cron job scheduler type of software.
+1 on the audio jitters and static, the realisticness/emotion also seems worse - is this just because you now have to be more specific in the instructions?
Very cool! Any updates on whether or not you’re going to build realtime models with reasoning capabilities? We’re really loving Gemini 2.5 Live (using a thinking budget) but the tool calling is not good. We’d love to see a similar product from OpenAI, since non-reasoning models cannot fit our use case very well.
Thank you for the upgrades on all fronts here. One next ask would be more voice profiles available, perhaps expanding to the fuller set available in the TTS model and beyond? That or being able to offer more voice library portability / cloning in the audio to audio models. Thanks!
Yes, great, but in my opinion the realtime API is quite left behind, probbaly because it is not a very important product for openAI ($$). However, be aware that its users are getting disappointed as times goes by. What about 4o-mini, which has a less bitter price? Will it be updated too? Thanks.
The only update I am interested in at the moment is support for multiple streams with their own context. Meaning, I would like to create a voice bot that can join a realtime audio conversation with multiple people and be able to differentiate between each, who is who and the context provided for each. And ultimately, only respond when needed.
Made a few tests, the instruction following has definitely improved for my use case.
When is this coming to mini? We dont really need 4o intelligence and cost.
We are compelled to use this pin for now.