Twilio - OpenAI SIP Trunk calls failing with '400 Bad Request' error

I couldnt reply to my original thread @Sean-Der

Hey! Sorry for the delay, I was trying to get some changes out and was harder then I expected.

You shouldn’t need to adjust the audio.input.format this all negotiated during the Offer/Answer in SIP. I think you can just not set anything and it will work. Are you setting this because you want to change the voice?

thanks

Yes, that’s correct.

Here’s the snippet of what I am doing after runner.run().

I noticed that even though the voice is correctly defined in the initial_model_settings - the greeting is not reflecting that set voice in the session.model.send_event - however, subsequent responses reflect the set voice correctly.

Also, OpenAI Traces aren’t working either.

        
    initial_model_settings: RealtimeSessionModelSettings = {
            "voice": voice or "alloy",
            "modalities": ["audio"],
            "turn_detection": {"type": "semantic_vad", "interrupt_response": True},
            "tracing": {
                "workflow_name": "voice_receptionist",
                "group_id": call_id,
                "metadata": { "tenant_id": tenant_id }
            }
        }



       async with await runner.run(
            model_config={
                "call_id": call_id,
                "initial_model_settings": initial_model_settings,
            }
        ) as session:
            # Get the ACTUAL iterator once
            event_iterator = session.__aiter__()

            # Get first event
            first_event = await anext(event_iterator)
            logger.info("First event: %s", first_event.type)

            

            # trigger an initial greeting
            # issue a response.create immediately after the Websocket attaches so the model speaks
            # before caller says anything
            await session.model.send_event(
                RealtimeModelSendRawMessage(
                    message={
                        "type": "response.create",
                          "other_data": {
                            "response": {
                                "instructions": (
                                    "Say exactly '"
                                    f"{greeting}"
                                    "' now before continuing the conversation."
                                ),

                            }

                        },                      

                    }

                )

            )

I have an conditional for the event_iterator for event.type == “error” which logs this:

realtime-webhook-1 | ERROR:realtime-webhook:Realtime session error: RealtimeError(message=“Invalid type for ‘session.audio.input.format’: expected an object, but got null instead.”, type=‘invalid_request_error’, code=‘invalid_type’, event_id=None, param=‘session.audio.input.format’)

However, audio works correctly with the greeting said first (but voice not working) and agent responding normally - it just function tool calling doesn’t work at all even though its set in the RealtimeAgent().

We’re seeing the same issue with intermittent SIP 400 errors from Twilio to OpenAI, where realtime.call.incoming never fires and the call fails before reaching the webhook. Since forwarding to a local webhook works reliably, it does point to an unstable SIP interaction rather than Twilio call delivery itself. You’re not alone—this seems similar to what others have reported since mid-December, and it likely needs investigation on the OpenAI SIP endpoint side.

Hey @bathinder

The issues with webhooks has been fixed. I posted an update here Realtime API unreliable over SIP - #13 by Sean-Der

Would you mind trying again and if you have any more issues please @ me right away and I will investigate!

1 Like

@Sean-Der any update on the function tool calls not working? I also can’t get the voice to work for the greeting response.

Hey @Christopher8827, Below is our finding in regards to your use case. Hope that helps. Thank you!

Why the greeting voice doesn’t match the configured voice

This behavior is expected with the Realtime API. The voice setting is only locked in once the model has produced audio at least once. If you trigger a response.create immediately after the WebSocket connects, the first audio response (your greeting) may use the default voice before the session fully commits the configured one. After that first audio output, the voice cannot be changed, which is why all subsequent responses correctly use the configured voice. This is called out in the Realtime docs around session behavior and audio generation.

 Docs: https://platform.openai.com/docs/guides/realtime-function-calling

What’s causing the session.audio.input.format error

The error you’re seeing (expected an object, but got null) is coming from the Realtime API validating session settings. session.audio.input.format must be an object (e.g. { type: "audio/pcm", rate: 24000 }). If it’s omitted or ends up as null during session initialization or update, the API will emit this error—even if audio playback still partially works. This matches the exact validation rules documented for Realtime session creation.

 Docs: https://platform.openai.com/docs/api-reference/realtime/create-call

Why OpenAI Traces aren’t showing up

Tracing only begins once the session successfully enters a steady state and the model produces responses. If the session hits validation errors early (like the audio format issue) or before a full response lifecycle completes, traces may never be recorded—even though the session appears to “work.” The Realtime tracing docs note that traces are tied to active response generation, not just session creation.

 Docs: https://developers.openai.com/blog/realtime-api

1 Like