Migrating from gpt-realtime-1.5 (beta) to gpt-realtime-2 (GA API) for a Twilio-based voice agent. The GA API rejects the old flat input_audio_format / output_audio_format parameters and requires a nested session.audio.input.format object, but the correct type value for G.711 μ-law (used by Twilio media streams) is unclear.
What we’ve tried
Beta API (worked fine):
{
“type”: “session.update”,
“session”: {
“input_audio_format”: “g711_ulaw”,
“output_audio_format”: “g711_ulaw”
}
}
GA API attempts (all rejected):
- “session.input_audio_format” → Unknown parameter: ‘session.input_audio_format’
- format: { type: “g711_ulaw” } → rejected
{
“type”: “session.update”,
“session”: {
“type”: “realtime”,
“output_modalities”: [“text”, “audio”],
“audio”: {
“input”: {
“format”: { “type”: “g711_ulaw” }
},
“output”: {
“format”: { “type”: “g711_ulaw” },
“voice”: “marin”
}
}
}
}
Questions
- What are the valid values for session.audio.input.format.type in the GA API?
- Is G.711 μ-law (8kHz) supported in gpt-realtime-2 over WebSocket, or only via SIP?
- Is there an official migration guide from the beta to the GA session config schema?
Environment
- Model: gpt-realtime-2
- Connection: WebSocket (wss://api.openai.com/v1/realtime)
- Transport: Twilio media streams (G.711 μ-law, 8kHz)
- No OpenAI-Beta header (GA API rejects it)
Any help appreciated — the GA API docs don’t enumerate the valid audio format types for the nested object
structure.