Hey folks,
I’m building a real-time voice chat using GPT-4o with audio responses. Everything works great on the AI side — I get real-time responses back just fine.
But I’m struggling to capture the user’s spoken transcript. I’ve been using webkitSpeechRecognition
in Chrome to get the user input before sending it to OpenAI, but:
- It stops randomly (especially on silence)
- It only works in Chrome
- And I don’t see the user input echoed back from OpenAI
Is there any way to get the user transcript directly from the API or something more reliable/cross-browser for speech-to-text?
Would love to hear how others are handling this!