Gpt-realtime-2 GA API: What is the correct audio format for g711_ulaw (Twilio/telephony)?

Migrating from gpt-realtime-1.5 (beta) to gpt-realtime-2 (GA API) for a Twilio-based voice agent. The GA API rejects the old flat input_audio_format / output_audio_format parameters and requires a nested session.audio.input.format object, but the correct type value for G.711 μ-law (used by Twilio media streams) is unclear.

What we’ve tried

Beta API (worked fine):
{
“type”: “session.update”,
“session”: {
“input_audio_format”: “g711_ulaw”,
“output_audio_format”: “g711_ulaw”
}
}

GA API attempts (all rejected):

  1. “session.input_audio_format” → Unknown parameter: ‘session.input_audio_format’
  2. format: { type: “g711_ulaw” } → rejected

{
“type”: “session.update”,
“session”: {
“type”: “realtime”,
“output_modalities”: [“text”, “audio”],
“audio”: {
“input”: {
“format”: { “type”: “g711_ulaw” }
},
“output”: {
“format”: { “type”: “g711_ulaw” },
“voice”: “marin”
}
}
}
}

Questions

  1. What are the valid values for session.audio.input.format.type in the GA API?
  2. Is G.711 μ-law (8kHz) supported in gpt-realtime-2 over WebSocket, or only via SIP?
  3. Is there an official migration guide from the beta to the GA session config schema?

Environment

  • Model: gpt-realtime-2
  • Connection: WebSocket (wss://api.openai.com/v1/realtime)
  • Transport: Twilio media streams (G.711 μ-law, 8kHz)
  • No OpenAI-Beta header (GA API rejects it)

Any help appreciated — the GA API docs don’t enumerate the valid audio format types for the nested object
structure.