Introducing the Realtime API

In each response.done sent by the server you can count the tokens

5 Likes

In my limited testing of Realtime API.
It can:

  • Whisper - sometimes is cut off and gets flagged.
  • Sing - always immediately cut off and flagged as violation.
  • Speak with various accents (Jamaican, Russian, etc.)
  • Produce ambient sounds during speech (this happens spontaneously).
  • Speak in higher/lower pitched voice.

It cannot:

  • Detect how the user speaks (loudly/quietly, whispering/not whispering).

You need to ensure that your system prompt really guides the model for the above to work.

1 Like

Hey, can someone their experience whether it’s better to integrate realtime API directly or through Livekit, Twilio or Agora?

I’d like to know which one works best, and how to differentiate one service from another. Maybe someone from OpenAI can shed some light here?

1 Like

I did with LiveKit , easy setup tools and framework.

1 Like

Realtime API is fantastic, but its extremely which makes deployment of commercial applications commercially unfeasible.

Do you see any reductions in pricing happening anytime soon?

Thanks

Prompt Caching being available now for the realtime API should make this more feasible in commercial deployments.

Hoping to see the service costs drop down as well, but this is pretty new, so only time will tell!

1 Like