Transcription config for `gpt-4o-mini-transcribe` doesn't work?

samrat · March 20, 2025, 10:45pm

Hi,

I’m trying to use the new STT model. I’m sending the following frame to the “wss://api.openai.com/v1/realtime?intent=transcription” WebSocket endpoint:

{
  "type": "transcription_session.update",
  "include": [
    "item.input_audio_transcription.logprobs"
  ],
  "input_audio_format": "pcm16",
  "input_audio_transcription": {
    "prompt": "",
    "language": "",
    "model": "gpt-4o-mini-transcribe"
  },
  "turn_detection": {
    "type": "server_vad",
    "threshold": 0.5,
    "prefix_padding_ms": 300,
    "silence_duration_ms": 500,
    "create_response": true
  },
  "input_audio_noise_reduction": {
    "type": "near_field"
  }
}

but I get the response

{
  "type": "transcription_session.created",
  "event_id": "event_BDIjP2WyEWBCFq31A1QrP",
  "session": {
    "id": "sess_BDIjPc4zRbhQgBDcJSDXL",
    "object": "realtime.transcription_session",
    "expires_at": 1742511767,
    "input_audio_noise_reduction": null,
    "turn_detection": {
      "type": "server_vad",
      "threshold": 0.5,
      "prefix_padding_ms": 300,
      "silence_duration_ms": 200
    },
    "input_audio_format": "pcm16",
    "input_audio_transcription": null, # <--- this became null
    "client_secret": null,
    "include": null
  }
}

The transcription also doesn’t work. I do get speech_started and speech_stopped events:

"{\"type\":\"input_audio_buffer.speech_started\",\"event_id\":\"event_BDIjZmZPzKtGoZb3BG1g2\",\"audio_start_ms\":9780,\"item_id\":\"item_BDIjZcVYwyGgsUmbpHA83\"}"

but instead of transcript events as documented I seem to receive conversation.item events:

"{\"type\":\"conversation.item.created\",\"event_id\":\"event_BDIja3273Hm7Ufs4GC5rL\",\"previous_item_id\":null,\"item\":{\"id\":\"item_BDIjZcVYwyGgsUmbpHA83\",\"object\":\"realtime.item\",\"type\":\"message\",\"status\":\"completed\",\"role\":\"user\",\"content\":[{\"type\":\"input_audio\",\"transcript\":null}]}}"

From the responses, I suspect that it’s still treating the connection as the non-transcription Realtime API. Not sure what I’m doing wrong though.

Any ideas would be appreciated.

_j · March 20, 2025, 11:40pm

I can throw random ideas your way.

I don’t know if the URL query string is correct or needed. I’m guessing you found that somewhere, but the API reference doesn’t have it.

logprobs is only an include on transcriptions, not realtime.

List models by API. Try the dated version instead of alias.

The model whisper-1 is also supported. You can see if it is the new model itself causing the necessary input_audio_transcription details to be dropped.

If you application is not continuous realtime audio, monitoring for when someone speaks with VAD and triggering transcription response events, the transcriptions endpoint also has the new models available.

That’s just stuff thrown against the wall so far.

samrat · March 21, 2025, 12:04am

Thanks for the ideas, but they didn’t work.

The URL param is listed here: https://platform.openai.com/docs/guides/speech-to-text#streaming-the-transcription-of-an-ongoing-audio-recording

Tried with whisper-1 and without logprobs and that didn’t work either.

paulotaylor · March 21, 2025, 12:40pm

Looks like the documentation is not up-to-date.
Right now you actually have to use the “session” field to work or it will throw an error

{
  "type": "transcription_session.update",
  "session": {
    "input_audio_format": "pcm16",
    "input_audio_transcription": {"model": "gpt-4o-mini-transcribe"},
    "turn_detection": {
      "type": "server_vad",
      "threshold": 0.5,
      "prefix_padding_ms": 300,
      "silence_duration_ms": 500,
    },
    "input_audio_noise_reduction": {
      "type": "near_field"
    },
  }
}

samrat · March 21, 2025, 1:13pm

Thanks! I realized the same thing and have pinged someone from the OpenAI team on Twitter about this.

Topic		Replies	Views
Missing input audio transcription API api-realtime	6	188	May 12, 2025
Input_audio_transcription not working in Real-Time — related to g711_ulaw? Bugs realtime	7	1698	December 26, 2024
Realtime transcription model changes Deprecations whisper , realtime	2	275	May 27, 2025
[Realtime API] Input audio transcription is not showing Bugs realtime	12	3188	July 3, 2025
Realtime API: session update doesn't change input audio format Bugs realtime	25	2565	November 19, 2024

Transcription config for `gpt-4o-mini-transcribe` doesn't work?

Related topics