Follow-up now that the incident is resolved:
For reference, OpenAI published an incident report at status.openai.com/incidents/y29b1rax — “Elevated error rates on Realtime API SIP endpoints” — confirming this was a platform-side issue.
However, the timeline is concerning:
- Mar 24, 11:12am AEDT — first report in this thread
- Mar 25, 04:24am AEDT — incident created on status page (~17 hours later)
That’s a 17-hour gap between the first public report and OpenAI acknowledging the incident. We’re using this SIP endpoint in a production system, and that kind of detection lag is a serious problem for anyone relying on it commercially.
A few questions for OpenAI
- Is there active monitoring on the SIP ingress layer? The
180 Ringing followed by 400 invalid_offer is a clear signal — this should be trivially detectable with a synthetic probe calling the endpoint every few minutes.
- Why was the status page not updated until 17 hours after the first community report? Was this caught by internal monitoring at all, or only noticed after community reports escalated it?
- Is there a recommended way for production users to subscribe to SIP-specific alerts? The general status page seems insufficient if SIP incidents take this long to surface.
We’d appreciate transparency on what monitoring improvements, if any, are planned.
4 Likes
I have been raising the same questions internally. The error was sent over to OpenAI at 1:38 am California time, based on someone monitoring this thread. Considering this looked to be a large outage, it does beg questions of if there are logs that would have revealed the error before hours of reports started tricking in here.
Update: OpenAI is going to produce an RCA on this and if anything can be shared, will let us know.
6 Likes
Here is the writeup from OpenAI on the SDP error:
What happened
Beginning around 16:15 PDT on March 23, 2026, customers experienced widespread failures in Realtime API SIP call setup, resulting in SIP negotiation errors and near-complete call setup failure. The issue was caused by the webhooks API returning 500 errors due to a downstream networking dependency change. This broken routing path prevented successful SIP session initialization. This was a platform-side issue affecting Realtime API SIP call setup, not an issue with customer SDP or configuration.
The incident was fully mitigated around 12:15 PDT on March 24, 2026 after a fix redirected webhooks traffic to the correct endpoint. Systems have remained stable since the fix was deployed.
Root cause
A legacy dependency on routing path remained in webhooks API after a downstream networking dependency change. This resulted in requests being sent to a non-existent endpoint, producing 404 responses that surfaced as 500 errors and blocked SIP setup flows.
Why you saw no webhook / vague errors
Incoming calls were not delivered to customer webhooks and vague error messages were returned to callers.
Detection gap
This issue was not immediately flagged by existing alerting: The failure mode was specific to SIP traffic and manifested as SIP negotiation errors rather than HTTP errors, which limited the effectiveness of broader alerting rules.
What we’re improving
3 Likes