How do you handle user transcripts in real-time GPT-4o chats?

kivseddy · April 24, 2025, 9:25pm

Hey folks,
I’m building a real-time voice chat using GPT-4o with audio responses. Everything works great on the AI side — I get real-time responses back just fine.

But I’m struggling to capture the user’s spoken transcript. I’ve been using webkitSpeechRecognition in Chrome to get the user input before sending it to OpenAI, but:

It stops randomly (especially on silence)
It only works in Chrome
And I don’t see the user input echoed back from OpenAI

Is there any way to get the user transcript directly from the API or something more reliable/cross-browser for speech-to-text?

Would love to hear how others are handling this!

vdhavala · April 25, 2025, 10:23pm

OpenAI’s Realtime API can optionally provide you the user side transcript. Can you use that? OpenAI RT API is voice-to-voice model. Optionally, OpenAI can provide you the user-side transcript by running it through a transcriber. You need to configure in session update that you need user side transcripts and also choose your model. Then, at conversation time, you need to subscribe to an event ‘response.audio_transcript.done’.

See details here https://platform.openai.com/docs/api-reference/realtime-server-events/response/audio_transcript/done

Julia_Jimenez_Blasco · June 3, 2025, 8:23am

I tried that but I only get the assistants transcript in the ‘response.audio_transcript.done’ event, not the user transcript. For the user transcript it should be ‘conversation.item.input_audio_transcription.completed’.

Topic		Replies	Views
How to get input_audio_transcription when i use openai realtime api API realtime , api-realtime , api-realtime-speech	1	132	June 24, 2025
Missing input audio transcription API api-realtime	6	188	May 12, 2025
Transcript of Realtime API's audio response? API realtime	5	1583	June 25, 2025
[Realtime API] Input audio transcription is not showing Bugs realtime	12	3176	July 3, 2025
Unable to Access User Audio Transcript in Realtime API API api-realtime	5	1609	February 10, 2025

How do you handle user transcripts in real-time GPT-4o chats?

Related topics