Hi,
Can you describe better what was the issue with the transcription? Were you missing some text?
Do I get it correctly that you record the audio and send it to whisper in case the transcription via realtime-SIP fails?
What’s the audio config for? Is that used for the fallback, or you just configure what model should be used for transcription with realtime-SIP?
audio: {
input: {
// Format must match actual audio received (G.711 μ-law from SIP)
transcription: {
model: 'whisper-1' // GA format, not beta
}
}
}
josh31
September 24, 2025, 10:22pm
22
Can you provide an example please? The documentation does not show this. I’m only seeing transcript from OpenAI output, not my caller’s voice input.
Currently with my setup, I’ll be spending credits for both transcription and gpt-realtime.
josh31
September 24, 2025, 10:26pm
23
Yes, exactly. It acts as a fallback when no transcription is received. The format is for the session.update API call. This is the format it expects. Specifically, the input, when my caller speaks, is the input to be transcribed. Separately, though, it appears OpenAI is transcribing output from the AI voice agent.
1 Like
I have a similar problem - it seems that transcribe does not transcribe the caller voice on (in my case) incoming SIP calls.
1 Like
Yes Jubert! please if you can provide an example . Thank you as always!!!
juberti
September 26, 2025, 4:08pm
26
see agent.ts in hello-realtime | Val Town . Note that for some reason some folks are getting conversation.item.input_audio_transcription.failed errors back when configuring transcription, still looking into that.
1 Like
juberti
September 26, 2025, 7:30pm
27
if you are getting these .failed errors when setting up transcription, try this curl example to see if your API key is enabled for our Audio APIs.
I’m having this issue with the failed transcription events as well. Call is working perfectly. I accepted the call like this:
await oai_client.post(
f"/realtime/calls/{call_id}/accept",
body={
"type": "realtime",
"model": REALTIME_MODEL,
"instructions": REALTIME_INSTRUCTIONS,
"audio": {
"input": {
"turn_detection": {
"type": "server_vad",
"threshold": 0.65
},
"transcription": {
"language": "en",
"model": "gpt-4o-transcribe-latest"
},
},
"output": {
"voice": "shimmer",
"speed": 1.25,
},
},
},
cast_to=httpx.Response
)
But I get failed events like
{"type":"conversation.item.input_audio_transcription.failed","event_id":"event_CMKd5yz496AvZGzB2Q49a","item_id":"item_CMKd5yUAuaEkiFEWUp6sW","content_index":0,"error":{"type":"server_error","code":null,"message":"Input transcription failed for item 'item_CMKd5yUAuaEkiFEWUp6sW'.","param":null}}
@juberti I verified that the curl command you specified works just fine with the same API key.
juberti:
this curl example
Ah okay–it works with gpt-4o-transcribe as the model instead of gpt-4o-transcribe-latest, so maybe the docs just need updating?
The model to use for transcription. Current options are whisper-1, gpt-4o-transcribe-latest, gpt-4o-mini-transcribe, and gpt-4o-transcribe.
juberti
October 2, 2025, 10:29pm
30
Thanks for helping figure this out, we’ll make sure the docs get updated so others don’t hit this issue.
1 Like