Hi OpenAI Team,
I’m writing to request the restoration of the input_audio_transcription feature that was available in the Realtime API beta but removed in the GA release.
What We Had in Beta:
// Session configuration (Beta)
{
type: "session.update",
session: {
modalities: ["text", "audio"],
turn_detection: { type: "server_vad" },
input_audio_transcription: { model: "whisper-1" }, // ✅ This existed
input_audio_format: "g711_ulaw",
output_audio_format: "g711_ulaw"
}
}
This would generate conversation.item.input_audio_transcription.completed events containing real-time user speech transcripts within the same session.
What Changed in GA:
The GA release removed input_audio_transcription entirely. Multiple developers in the community have reported this issue:
- “Input_audio_transcription in realtime-api”
- “Unable to Access User Audio Transcript in Realtime API”
- “Problems using session.update with the realtime-api”
Questions:
- Is
input_audio_transcriptionremoval permanent or temporary? - What is the recommended approach for real-time user transcription in GA?
- Are there alternative methods we should be using?
The community would greatly appreciate clarity on this feature’s status.
Thank you for considering this request.
Best regards,
Rohan