I’m also experiencing more jitters and latency than in gpt-4o-realtime-preview-2024-10-01. I think the fluency of the conversation is not better as well.
Hi guys! Looks like there is no MCPServerSse Available in this sdk still. Only in python sdk. Am i right? There’s any prevision for implementing that?
The jittering and breaks in speech are still occurring, is there any update on this?
How does the API cost with gemini’s live API compare to OpenAI’s Realtime API?
We’re currently integrating Twilio with OpenAI’s RealtimeAgent (using the latest gpt-4o-realtime-preview-2025-06-03 model), and we’ve been experiencing two major issues:
- Latency in function calls: Tool call responses are taking 3–5 seconds, even for lightweight local functions. This delay impacts the responsiveness of our voice agent and breaks conversational flow.
- Audio jitter and quality drops: During voice calls, we’re noticing occasional jitters, static, and unnatural pauses. This seems to vary across sessions but happens frequently enough to affect usability.
Hi! I’m excited to watch the progress here. Not yet suitable for our use case - feedback:
- We’d need a (text) context window of at least 100k tokens before 4o-realtime would fit our use case (domain-expert conversational voice chatbot).
- In my sandbox testing, the model glitches out frequently, halts its reply with an error message saying that the reply was halted because it set off some content flag. This was on thoroughly boring mundane topics, nothing racy.
- The cost might also be a barrier to us, but more testing needed. I’m confused about how the bill adds up. For example over the course of an audio convo where I tested 4o-realtime as a language pronunciation tutor, the logs say ~400k tokens were sent and ~16k tokens were received, yet the API usage & cost panel says that conversation racked up a $6 bill where I’d expect more like $2 given those token counts.
Will support for the RealtimeAgent be coming to the Python SDK?
A post was split to a new topic: How can I suppress agent voice response?
Function calling seems to have become less reliable in this updated, although more reliable function calling was one of its targets.
Hey Ben!
Thank you for your question and we get your concern. There is currently no rule of “one SDK will always lead” but the two SDKs are being developed in parallel trying to stay consistent on the general concepts and most feature launches. There are some places where they indeed diverge though. They roughly fall into three categories:
- Native feeling/experience. We want the SDKs to feel like first class citizens for the respective developer using them not as a port from a different language. Because of that the SDKs might have slightly different behaviors for example their use of decorators in Python for tool definitions or event emitters for lifecycle events in TypeScript. Similarly Python supports LiteLLM while TypeScript supports Vercel’s AI SDK to support third-party models as those are common libraries in their respective communities. This might also include features that the respective community is requesting that might not justify immediate porting to the other SDK.
- New features that require breaking changes. When building the TypeScript SDK we addressed some things we want to change in the Python one but that would require a breaking change we did not want to perform yet. Most notably the serializability of state that allowed us to introduce Human-in-the-loop (HITL) solutions or creating a more abstract model input/output protocol that makes adding support for third-party models easier.
- Realtime features. For now most changes around Realtime support will likely land exclusively in the TypeScript SDK until we reach a stable point for the interface. Since we were starting with a blank slate in the TypeScript SDK it allowed us to rethink how we wanted this to work and also make it compatible in a way where you can use the same SDK client-side and server-side while Python would be limited to server-side Websockets only.
However, we try to keep the two in sync for other features. For example today’s Prompts launch is supported in both the TypeScript and Python SDK.Overall we strive to create the best experience for the respective community but there is no hard rule that Python will always get features first.
Eventually yes but we don’t have a timeline yet. We want to first see if there are any things we learn from the TypeScript one that we want to implement in the Python one when porting the feature.
Hey @wellingtonbsz! SSE support has been deprecated in the MCP spec itself after the Python SDK implemented it. Since we shipped MCP support in the TypeScript SDK after the deprecation we decided not to ship SSE support.
The pricing needs to drop dramatically for this to be useful for any commercial applications, especially since Gemini Live API is 5-7X cheaper.