🚀 gpt-realtime-1.5 is live in Realtime API

Hey everyone,

Big step forward for voice today: gpt-realtime-1.5 just dropped in the Realtime API.

Quick highlights from the team:

  • +5% on Big Bench Audio reasoning
  • +10.23% alphanumeric transcription accuracy
  • +7% instruction following
  • More reliable tool calling and multilingual handling overall

Pricing remains steady with the original gpt-realtime:

  • Text: $4 / 1M input | $0.40 cached | $16 / 1M output
  • Audio: $32 / 1M input | $0.40 cached | $64 / 1M output

Early adopters are already feeling the upgrade:

  • Genspark reports connection rates nearly doubled (up to 66% ) and phone call errors cut in half.
  • Sendbird highlights exceptional improvements in handling interruptions.

Check out the latest docs here: Realtime API | OpenAI API

Curious about your experience:

  • Are you noticing reduced latency in your setups?
  • Any standout improvements or quirks in tool calling and multilingual tasks?
  • How does it stack up side-by-side with previous realtime models?

Drop your insights, benchmarks, or any questions right here!

Excited to hear your thoughts!

14 Likes

Also, don’t miss the cool demo from Charlie:

5 Likes

This is a big improvement. Really liking it.

2 Likes

2 posts were split to a new topic: What are the new GPT Realtime voices?

We’re very happy with the improved performance on alphanumeric accuracy. Tool calling feels a lot faster too. We’ve had to adjust our realtime agents because the 1.5 is more ‘descriptive’ in the actions it takes, so we’re now actively specifying when it should/shouldn’t indicate it is taking an action.

Downside: One thing that really stands out to us is how the intonation for Dutch and Flemish has regressed from the previous gpt-realtime, even if we spend significant time prompting for it. Many of our clients still prefer the previous model for that reason, even with the reduced performance on alphanumerics.

2 Likes

I totally agree with @Dennis_Stellar:

  • Tool calling is really faster and smoother at the same time.
  • Alphanunmeric transcription accuracy is clearly better + the model seems to accept / take into account user corrections more easily which reduces frustration.

To share more details about the intonation from French customer point of view:
One of our usecases for realtime API is to handle customers who are waiting in our call center queue for too long in order to eventually reprioritize the call.
I launched an A/B Test to measure gpt-realtime-1.5 performance compared to gpt-realtime, using the exact same prompts and tools, calls being redirected randomly between the following variants:

  • Variant A (gpt-realtime): 716 sessions
    642 calls successfully redirected by the AI after completing its task.
    74 customers hung up while speaking to the AI Agent.
  • Variant B (gpt-realtime-1.5): 507 sessions
    429 calls successfully redirected by the AI after completing its task.
    78 customers hung up while speaking to the AI Agent.

It represents a drop of an additional 5% of our customers with the gpt-realtime-1.5 model, with a confidence of 99,2% (p=0.008).
Listening at some call recordings, I can hear that the intonation is less natural and customers seem less confident speaking to this new model (at least in French).

2 Likes

It is nice, seems more accurate to some extent. But, if you ask gpt-realtime to laugh, the model will produce laughter, in 1.5’s case, it will say “laugh”. If you prompt says “laugh hysterically” it will say “laugh hysterically” instead of actually laughing hysterically. gpt-realtime makes the agent actually laugh.

2 Likes

No love for a new mini version? Non-mini realtime is too expensive to run production apps.