Unable to Access User Audio Transcript in Realtime API

Hello everyone,

I have created a call center using the Realtime API. For this system, I want to log user questions and the Realtime API responses as text. I am able to access the “response” transcript of the Realtime API; however, I am unable to access the input audio transcript from the user.

I used the server event ‘conversation.item.input_audio_transcription.completed’, but after reviewing the Event Type logs, I noticed that the

"conversation.item.input_audio_transcription.completed" event never occurred. The “conversation.item.created” event is generated for the user, but the "input_audio_transcription.completed" event does not happen afterward.

Additionally, for the OpenAI Realtime API console demo, the appropriate JavaScript usage for my purpose is as follows: {!conversationItem.formatted.tool && conversationItem.role === 'user' && ( <div> {conversationItem.formatted.transcript || (conversationItem.formatted.audio?.length ? '(awaiting transcript)' : conversationItem.formatted.text || '(item sent)')} </div> )}

Link: API Reference - OpenAI API

Event log:

Received event: input_audio_buffer.speech_started {'type': 'input_audio_buffer.speech_started', 'event_id': 'event_AP8MyRiW8UKkhWGOui0Wr', 'audio_start_ms': 1088, 'item_id': 'item_AP8MykZe1zpm27EHYn629'}

Received event: input_audio_buffer.speech_stopped {'type': 'input_audio_buffer.speech_stopped', 'event_id': 'event_AP8Mz74Ifuh1NtIIqu3a3', 'audio_end_ms': 1952, 'item_id': 'item_AP8MykZe1zpm27EHYn629'}

Received event: input_audio_buffer.committed {'type': 'input_audio_buffer.committed', 'event_id': 'event_AP8MzKbCHeeuNIn9ykMex', 'previous_item_id': 'item_AP8Mpc4O2eN11xsgMAIBN', 'item_id': 'item_AP8MykZe1zpm27EHYn629'}

Received event: conversation.item.created {'type': 'conversation.item.created', 'event_id': 'event_AP8Mztx81EqrpgL2qSypw', 'previous_item_id': 'item_AP8Mpc4O2eN11xsgMAIBN',

'item': {'id': 'item_AP8MykZe1zpm27EHYn629', 'object': 'realtime.item', 'type': 'message', 'status': 'completed', 'role': 'user', 'content': [{'type': 'input_audio', 'transcript': None}]}}

IBN', 'item_id': 'item_AP8MykZe1zpm27EHYn629'}

Received event: conversation.item.created {'type': 'conversation.item.created', 'event_id': 'event_AP8Mztx81EqrpgL2qSypw', 'previous_item_id': 'item_AP8Mpc4O2eN11xsgMAIBN',

'item': {'id': 'item_AP8MykZe1zpm27EHYn629', 'object': 'realtime.item', 'type': 'message', 'status': 'completed', 'role': 'user', 'content': [{'type': 'input_audio', 'transcript': None}]}}
2 Likes

Hi, any news on that, did you resolve it? I have the same problem that after I finished speaking, no event occurs using Twilio and OpenAI Realtime API - even though that worked a week ago, using “conversation.item.input_audio_transcription.completed”. Only when I hang up the phone, I sometimes get a transcription of my first message.

1 Like

What is the session configuration that you are calling session.update with?

I had the same issue.

To enable conversation.item.input_audio_transcription.completed for user-input you have to add the configuration for input audio transcription in the initial session event.

https://platform.openai.com/docs/api-reference/realtime-client-events/session/update

"input_audio_transcription": {
      "model": "whisper-1"
  }
1 Like

Wow, that solved my problem. Thanks.

I test it by speaking Turkish but some of the transcribe completed texts are detected in English. Is there any solution for this?