Hey everyone,
Big step forward for voice today: gpt-realtime-1.5 just dropped in the Realtime API.
Quick highlights from the team:
- +5% on Big Bench Audio reasoning
- +10.23% alphanumeric transcription accuracy
- +7% instruction following
- More reliable tool calling and multilingual handling overall
Pricing remains steady with the original gpt-realtime:
- Text: $4 / 1M input | $0.40 cached | $16 / 1M output
- Audio: $32 / 1M input | $0.40 cached | $64 / 1M output
Early adopters are already feeling the upgrade:
- Genspark reports connection rates nearly doubled (up to 66% ) and phone call errors cut in half.
- Sendbird highlights exceptional improvements in handling interruptions.
Check out the latest docs here: Realtime API | OpenAI API
Curious about your experience:
- Are you noticing reduced latency in your setups?
- Any standout improvements or quirks in tool calling and multilingual tasks?
- How does it stack up side-by-side with previous realtime models?
Drop your insights, benchmarks, or any questions right here!
Excited to hear your thoughts!