In each response.done
sent by the server you can count the tokens
3 Likes
In my limited testing of Realtime API.
It can:
- Whisper - sometimes is cut off and gets flagged.
- Sing - always immediately cut off and flagged as violation.
- Speak with various accents (Jamaican, Russian, etc.)
- Produce ambient sounds during speech (this happens spontaneously).
- Speak in higher/lower pitched voice.
It cannot:
- Detect how the user speaks (loudly/quietly, whispering/not whispering).
You need to ensure that your system prompt really guides the model for the above to work.
(post deleted by author)