Input_audio_transcription server events stopped occuring

daledesilva · September 28, 2025, 1:57pm

I know that many people have asked about transcription in realtime on the forum, and I’ve worked through that issue in the past, but something new seems to be happening for me.

Previously, when using the realtime BETA I had similar issues to others but was able to resolve it. The transcriptions were badly innacurate and broken up, but it worked.

When the v1 full release occurred, suddenly, the transcriptions turned way more accurate for me and came through more cleanly.

However, recently, something has happened and I no longer get any conversation.item.input_audio_transcription.* events.
( I log every event type that comes through and none do).

My server provides an ephemeral token like this, which works.

const sessionConfig = JSON.stringify({
      session: {
          type: "realtime",
          model: "gpt-realtime",
          audio: {
              output: { voice: "marin" },
          },
      },
    });

    const response = await fetch("https://api.openai.com/v1/realtime/client_secrets", {
      method: "POST",
      headers: {
        "Authorization": `Bearer ${openaiApiKey}`,
        "Content-Type": "application/json"
      },
      body: sessionConfig,
    })

My client WebRTC connection is established like this, and updates like below upon receivingt a session.created event — which all works. Audio conversation and response transcriptions all come through.

const url = `https://api.openai.com/v1/realtime/calls?model=${model}`;
            const response = await fetch(url, {
                method: 'POST',
                headers: {
                    'Content-Type': 'application/sdp',
                    'Accept': 'application/sdp',
                    'Authorization': `Bearer ${ephemeralToken}`
                },
                body: offer.sdp
            });

const sessionUpdateEvent = {
            event_id: Crypto.randomUUID(),
            type: "session.update",
            session: {
                modalities: ["text", "audio"],
                instructions: this.socketConfig.prompt,
                input_audio_format: "pcm16",
                output_audio_format: "pcm16",
                // input_audio_transcription: { model: "gpt-4o-mini-transcribe", language: "en" },
                input_audio_transcription: { model: "whisper-1" },
                voice: this.sessionConfig?.voiceId || 'marin',
                turn_detection: turnDetection,
                temperature: 0.8,
                max_response_output_tokens: 4096
            },
        }
this.sendJson(sessionUpdateEvent);


public sendJson(data: any): void {
        if (this.dataChannel?.readyState === 'open') {
            this.dataChannel.send(JSON.stringify(data));
        } else {
            console.error('Data channel is not open. Cannot send data.');
        }
    }

And yet, for some reason, just don’t get any input transcription events any more.
Does anyone know what I’m doing wrong?

Here’s an example console log:

📨 Event type: input_audio_buffer.speech_started
📨 Event type: input_audio_buffer.speech_stopped
📨 Event type: input_audio_buffer.committed
📨 Event type: conversation.item.added
{
  "type": "conversation.item.added",
  "event_id": "event_CKmMAY5RZlCxH2el9gdsh",
  "previous_item_id": "item_CKmM63o6FnE6EXwMq1sqr",
  "item": {
    "id": "item_CKmM82dqcGP0Guh7Z20bQ",
    "type": "message",
    "status": "completed",
    "role": "user",
    "content": [
      {
        "type": "input_audio",
        "transcript": null
      }
    ]
  }
}
📨 Event type: conversation.item.done
{
  "type": "conversation.item.done",
  "event_id": "event_CKmMAvdOcHDEgPpmyZxdV",
  "previous_item_id": "item_CKmM63o6FnE6EXwMq1sqr",
  "item": {
    "id": "item_CKmM82dqcGP0Guh7Z20bQ",
    "type": "message",
    "status": "completed",
    "role": "user",
    "content": [
      {
        "type": "input_audio",
        "transcript": null
      }
    ]
  }
}
📨 Event type: response.created
📨 Event type: response.output_item.added
📨 Event type: conversation.item.added
{
  "type": "conversation.item.added",
  "event_id": "event_CKmMBbnC63FblNTAWc56W",
  "previous_item_id": "item_CKmM82dqcGP0Guh7Z20bQ",
  "item": {
    "id": "item_CKmMAiMqMQ7E6zHEZYDNE",
    "type": "message",
    "status": "in_progress",
    "role": "assistant",
    "content": []
  }
}
📨 Event type: response.content_part.added
📨 Event type: response.output_audio_transcript.delta
📨 Event type: output_audio_buffer.started
📨 Event type: response.output_audio_transcript.delta
📨 Event type: response.output_audio.done
📨 Event type: response.output_audio_transcript.done
📨 Event type: response.content_part.done
📨 Event type: conversation.item.done
{
  "type": "conversation.item.done",
  "event_id": "event_CKmMC05Kx2EWticHXiipV",
  "previous_item_id": "item_CKmM82dqcGP0Guh7Z20bQ",
  "item": {
    "id": "item_CKmMAiMqMQ7E6zHEZYDNE",
    "type": "message",
    "status": "completed",
    "role": "assistant",
    "content": [
      {
        "type": "output_audio",
        "transcript": "Hey there! Loud and clear, your test is coming through perfectly. How can I help you today?"
      }
    ]
  }
}
📨 Event type: response.output_item.done

As you can see above, I get audio buffer events, and response events, and some conversation events, but not input transcriptions.

mcfinley · September 28, 2025, 9:50pm

There were a lot of changes since the v1 release and removing the beta flag. Here’s what’s working for me… note that the model is specified differently:

            "type": "session.update",
            "session": {
                "type": "realtime",
                "model": "gpt-realtime",
                "audio": {
                    "input": {
                        "format": {          
                            "type": "audio/pcm",
                            "rate": 24000,
                        },
                        "noise_reduction": {"type":"far_field"},
                        "transcription": {
                            "model": "gpt-4o-mini-transcribe" 
                            }...

with that in place, I get

Assistant audio transcript in response.output_audio_transcript.done

and User audio transcript in conversation.item.input_audio_transcription.completed

daledesilva · September 29, 2025, 12:00pm

Thank you! That was really helpful… I was able to look more clearly understand the docs as a result.
I did run into one issue (Which I think I saw on another thread previously).

The transcriptions still didn’t come through until I added the transcription config settings into the “create ephemeral token” code on the server as well.

daledesilva · September 29, 2025, 1:23pm

For anyone else that lands here… I’ve found this to be true with adjusting the turn detection values as well — It only pays attention to the ones sets on the server when creating the ephemeral token.

I imagine it’s true with several settings.

Topic		Replies	Views
Can't get the user transcription in realtime api API transcribe , realtime	8	2631	May 29, 2025
Getting no response event for input_audio_transcription in realtime ws API realtime , api-realtime	14	2706	July 17, 2025
[Realtime API] Input audio transcription is not showing Bugs realtime	12	3789	July 3, 2025
Missing input audio transcription API api-realtime	6	275	May 12, 2025
Unable to Access User Audio Transcript in Realtime API API api-realtime	5	1762	February 10, 2025

Input_audio_transcription server events stopped occuring

Related topics