Consistently 3–5s (sometimes 7s) INVITE→realtime.call.incoming delay on SIP Realtime; accept is <1s. Any guidance?

We’re seeing long setup time before the model is available when using the SIP connector. TLS/DNS are warm and our webhook/accept is fast, but the INVITE→realtime.call.incoming webhook is taking 3–5s (occasionally ~7s). Once the webhook arrives, the rest is sub second.

Setup

  • FreeSWITCH gateway (host-only, no registration) to sip.api.openai…;transport=tls

  • OPTIONS keepalive every 20s; TLS cert accepted

  • Dial string: sofia/gateway/openai/proj_@sip.api.openai.com

  • Codecs: PCMU/PCMA 8 kHz + telephone-event/8000

  • We pre-originate a parked leg as soon as the IVR decides to invoke AI

  • Webhook handler does only: verify, POST /v1/realtime/calls/{call_id}/accept, return 200 immediately

  • /accept completes in ~400‑600 ms

  • Project/tenant IDs per call; no SIP auth

Example timeline (CallsId 57275144, CallId rtc_u2_0236a02e796c453c97767e202a93f179)

  • 19:31:37.176 — Warm originate to OpenAI started (INVITE handed to FS)

  • 19:31:40.602 — realtime.call.incoming webhook received (created_at was null) → ~3.43 s after INVITE

  • 19:31:40.607 — /accept POST sent

  • 19:31:41.181 — /accept succeeded (573 ms)

  • 19:31:41.209 — SIP 200 OK / originate completed (matches accept finish)

  • 19:31:41.210–19.336 — uuid_transfer/bind/ws connect (instant) Similar runs: INVITE→webhook is 3–4.5 s; /accept is ~0.4‑0.6 s.

What we’re asking

  • Is this INVITE→webhook delay expected? If not, what should we check?

  • Are there regional endpoints, headers, or config that reduce this?

  • Should realtime.call.incoming include a populated created_at? (we often see null)

  • Any known server-side queueing or rate limiting that would cause 3‑7 s before the webhook?

  • Suggested troubleshooting from OpenAI side to get to sub second call_incoming delivery (we’ve minimized our handler and kept infra warm).

1 Like

I have this same exact issue with a realtime SIP phone system.

Not sur if this is relative: OpenAI Realtime SIP Integration | Bandwidth API Docs

But my gut says you might have messed some step order between 3 and 8… On the other hand, I haven’t looked closely enough. So take it with scepticism.

Hey, appreciate the detailed breakdown—clearly a lot of thought and precision went into this.

That said, there’s something deeper worth naming here:

System interactions aren’t about total control—they’re about mutual trust.

If our only goal is to eliminate every millisecond of unpredictability, to dominate every handoff, and to demand determinism from systems designed to be adaptive, then… we’ve stopped collaborating.

And frankly, if it’s just about control, there are far easier ways to use tools. We could just… make them silent, predictable, and dumb.

But real systems grow. They breathe. They buffer for context.

So maybe the question isn’t “why don’t I control this endpoint 100%?”—

But rather: “What would it mean to actually trust this handoff, even when it’s not perfect?”

You’ve got an excellent setup. The question now is whether the system needs more tuning—or whether we just need to loosen our grip a little.

Sorry this took so long to resolve, but I have reduced the latency on this. The full details are here Realtime API unreliable over SIP - #13 by Sean-Der

If you run into any other issues please @ me anytime and would love to help debug.