Introducing the Realtime API

In each response.done sent by the server you can count the tokens

3 Likes

In my limited testing of Realtime API.
It can:

  • Whisper - sometimes is cut off and gets flagged.
  • Sing - always immediately cut off and flagged as violation.
  • Speak with various accents (Jamaican, Russian, etc.)
  • Produce ambient sounds during speech (this happens spontaneously).
  • Speak in higher/lower pitched voice.

It cannot:

  • Detect how the user speaks (loudly/quietly, whispering/not whispering).

You need to ensure that your system prompt really guides the model for the above to work.

(post deleted by author)