Get current word spoken from the realtime voice API

Hi I’m trying to build a realtime talking avatar with lipsyncing as close as possible to the currently spoken voice that is produced using the new RealTime Voice API, I could not find any event that could give me an indication of what word is currently spoken , even better I would love to get the realtime Viseme or phoneme events

1 Like

That is not a service offered.

You can use Whisper API on audio, with it returning word-level timestamps with transcription.

1 Like