Hi I’m trying to build a realtime talking avatar with lipsyncing as close as possible to the currently spoken voice that is produced using the new RealTime Voice API, I could not find any event that could give me an indication of what word is currently spoken , even better I would love to get the realtime Viseme or phoneme events
1 Like
That is not a service offered.
You can use Whisper API on audio, with it returning word-level timestamps with transcription.
1 Like