OpenAI Realtime API SIP Audio Potentially Critical Bug Report
Date: October 22, 2025
Affected Service: OpenAI Realtime API - SIP Integration
Severity: Critical - Complete audio failure on SIP calls
Status: Active bug affecting multiple users
Executive Summary
OpenAIβs Realtime API SIP integration has a critical bug where SIP calls connect successfully and AI responses are generated (confirmed by transcripts and WebSocket events), but no audio is transmitted over the RTP stream to callers. The same codebase was working correctly on October 18, 2025 and stopped working sometime between then and October 22, 2025, with no changes to our implementation.
This matches reports from other users in the OpenAI Developer Community experiencing identical issues in October 2025.
Environment Details
Infrastructure
- SIP Provider: SignalWire (SWML-based routing)
- Platform: Railway (Node.js backend)
- Region: US East
- Network: Cloud-hosted (Railway infrastructure)
OpenAI Configuration
- API Endpoint:
https://api.openai.com/v1/realtime/calls/{call_id}/accept - SIP Endpoint:
sip:proj_xxxxxxxxxxxxxxxW@sip.api.openai.com;transport=tls;model=gpt-realtime-preview - WebSocket:
wss://api.openai.com/v1/realtime?call_id={call_id} - Model (broken):
gpt-realtime - Model (attempting workaround):
gpt-realtime-2025-08-28
Audio Configuration
{
"type": "realtime",
"model": "gpt-realtime",
"audio": {
"input": {
"format": "audio/pcmu",
"transcription": {
"language": "en",
"model": "whisper-1"
},
"turn_detection": {
"type": "semantic_vad"
}
},
"output": {
"voice": "cedar",
"format": "audio/pcmu"
}
}
}
Timeline
| Date | Time | Event |
|---|---|---|
| Oct 18, 2025 | 3:03 PM UTC-4 | |
| Oct 18, 2025 | 2:56 PM UTC-4 | Deployed retry logic with exponential backoff |
| Oct 21, 2025 | 5:50 PM | Added model=gpt-realtime-preview parameter to SIP URI |
| Oct 22, 2025 | Morning | |
| Oct 22, 2025 | Afternoon | Confirmed multiple users reporting same issue |
Symptoms
What Works 
-
SIP signaling completes successfully
- INVITE accepted with HTTP 200 OK
- OpenAI returns proper SIP response
- Call state shows as connected
-
WebSocket connection established
- Opens successfully
- No WebSocket 1006 errors (in recent tests)
- Maintains stable connection
-
AI generates responses internally
- Logs show
response.output_audio_transcript.deltaevents - Logs show
response.output_audio.doneevents - Transcripts are generated and stored correctly
- Audio token counts confirm generation (e.g., 173 audio tokens)
- Logs show
-
Webhook events fire correctly
realtime.call.incomingreceived- Call accepted successfully
- All expected events arrive
Whatβs Broken 
-
RTP audio stream is silent
- Caller hears complete silence
- No audio transmitted despite OpenAI generating it
- RTP packets may be flowing but contain no audible audio
-
Both codecs affected
- PCMU (G.711 ΞΌ-law) - silent
- PCMA (G.711 A-law) - not tested but community reports same issue
-
OpenAI appears to generate additional βphantomβ audio responses
- After initial greeting, OpenAI continues generating new audio responses
- These responses appear as separate
realtime.responseevents with unique IDs - Each contains new transcripts as if OpenAI is hearing input weβre not sending
- Example: After greeting, AI asks βCould you please tell me who you are and why youβre calling?β
- This suggests OpenAI may be βhearingβ something on the RTP input stream that weβre not sending
- Hypothesis: RTP bidirectional issue - OpenAI may be receiving noise/silence as input and responding to it
Evidence from Logs
Successful WebSocket Connection (but no audio)
[OpenAI-SIP] β
WebSocket connected for call rtc_a46f81d14638494996963cc04eabaa12
[OpenAI-SIP] π¨ Event: response.created
[OpenAI-SIP] π¨ Event: response.output_audio.done
[OpenAI-SIP] π AUDIO EVENT DETAILS: {
"type": "response.output_audio.done",
"event_id": "event_CTWmcf72RIDeujSdIc1PW",
"response_id": "resp_CTWmZhrXaRDN7CR6pAPH3",
"item_id": "item_CTWmZTTulwmreIkFypPNQ",
"output_index": 0,
"content_index": 0,
"transcript": "Hi, I'm XX, the AI answering service..."
}
Audio Format from OpenAI (WebSocket monitoring)
{
"audio": {
"output": {
"format": {
"type": "audio/pcm",
"rate": 24000
},
"voice": "cedar"
}
},
"usage": {
"output_token_details": {
"audio_tokens": 173
}
}
}
Note: WebSocket shows audio/pcm at 24kHz internally, but SIP should use audio/pcmu at 8kHz. This is expected - OpenAIβs SIP gateway should transcode.
Phantom Audio Responses (Bidirectional RTP Issue)
Critical Observation: OpenAI generates multiple audio responses during a call where the caller is not speaking. This indicates OpenAI may be receiving RTP input that weβre not sending, or interpreting silence/noise as input.
{
"object": "realtime.response",
"id": "resp_CTXDa69PlLThlfCbIQx9m",
"output": [
{
"type": "output_audio",
"transcript": "Hi, I'm [Company Name], the AI answering service. I notice you're calling from a number I don't recognize. Could you please tell me who you are and why you're calling?"
}
],
"output_token_details": {
"text_tokens": 52,
"audio_tokens": 174
},
"usage": {
"total_tokens": 336,
"input_tokens": 110,
"output_tokens": 226
}
}
Key Evidence:
input_tokens: 110- OpenAI claims to have received 110 audio tokens of input- But caller has not spoken - theyβre waiting to hear the greeting
- This suggests OpenAI is receiving RTP input (possibly silence or noise) and treating it as speech
- This may indicate a bidirectional RTP problem - both input and output paths affected
Alternative Explanation:
- Input tokens could be from initial session setup/configuration
- But the fact that AI asks follow-up questions suggests itβs βhearingβ something
Example Call IDs with Audio Failure
With gpt-realtime model (broken):
rtc_a46f81d14638494996963cc04eabaa12(Oct 22, 2025 17:19 UTC)rtc_fd68f3e535ca45cf8b231abc52d6379c(Oct 22, 2025 17:04 UTC)
With gpt-realtime-2025-08-28 model (testing workaround):
rtc_cc10cc08615341cb9e240087561a647a(Oct 22, 2025 17:47 UTC)- Model logged:
"model": "gpt-realtime-2025-08-28" - Audio generated: 174 audio tokens
- Transcript generated: βHi, XX, the AI answering serviceβ¦β
- Result: Audio still silent - workaround did NOT fix the issue
- Model logged:
Troubleshooting Steps Attempted
1. Added Model Parameter to SIP URI 
Issue: Initial WebSocket 1006 errors due to missing model parameter
Fix: Added ;model=gpt-realtime-preview to SIP URI
Result: WebSocket 1006 errors resolved, but audio still silent
2. Changed answer_on_bridge Setting 
Hypothesis: Audio path timing issue with SignalWire answering
Change: Set answer_on_bridge: false in SWML
Result: No improvement - audio still silent
3. Explicit Codec Specification 
Hypothesis: Codec negotiation failure
Change: Added codecs: 'PCMU,PCMA' to SWML connect block
Result: No improvement - audio still silent
4. Switched to Dated Model Version (In Progress) 
Hypothesis: Recent model update introduced audio bug
Change: Using gpt-realtime-2025-08-28 instead of gpt-realtime
Result: Testing in progress
Related Community Reports
Identical Issues Reported by Other Users
-
[BUG?] SIP Realtime β distorted or missing audio
- Thread: [BUG?] SIP Realtime β distorted or missing audio (noise/static) with `gpt-realtime` and `gpt-realtime-mini`
- Date: October 22, 2025
- Location: Brazil (South America)
- Symptoms: βNo audio from assistant or distorted/static audioβ
- Quote: βThe same setup was working perfectly a few weeks ago with the exact same code and configurationβ
- Models Affected:
gpt-realtime-mini,gpt-realtime
-
Asterisk + OpenAI Realtime SIP β call connects but no audio
- Thread: Asterisk + OpenAI Realtime SIP β call connects but no audio
- Symptoms: Same as ours - SIP works, no audio
Technical Analysis
Hypothesis 1: Bidirectional RTP Problem (Most Likely)
Evidence:
- Output RTP (OpenAI β Caller): Audio generated but silent - RTP packets may not contain audio data
- Input RTP (Caller β OpenAI): OpenAI receiving 110 input tokens when caller hasnβt spoken
- Previous OpenAI fix involved RTP being blocked for non-Twilio/Telnyx providers
- SignalWire (our provider) may not be whitelisted for RTP traffic
- SIP signaling works perfectly (TLS), suggesting RTP-specific issue
Supporting Evidence:
- WebSocket shows audio generation (internal confirmation)
- SIP call connects successfully (signaling path clear)
- Transcripts generated correctly (AI is working)
- Audio tokens counted (audio is being created)
- NEW: OpenAI reports input tokens when caller is silent - suggests RTP input path is also affected
Possible Root Causes:
- SignalWire IP ranges not whitelisted in OpenAIβs RTP firewall
- RTP media path routing incorrectly configured
- NAT/firewall blocking bidirectional RTP media flow
- RTP packet format mismatch (though unlikely with standard PCMU)
Hypothesis 2: Recent Model Update Regression
Evidence:
- Working on Oct 18, broken by Oct 22
- No code changes between working and broken states
- Multiple users reporting same timeline
Test in Progress:
- Using
gpt-realtime-2025-08-28(August dated version) - If this works, confirms regression in current
gpt-realtimealias
Hypothesis 3: SDP/Codec Negotiation Change
Evidence:
- Community reports mention both PCMU and PCMA affected
- No codec-specific errors in logs
- Explicit codec specification didnβt help
Unlikely because:
- No SIP error codes indicating negotiation failure
- Multiple codecs affected identically
Request to OpenAI Engineering
Immediate Investigation Needed
-
Check RTP firewall rules
- Verify SignalWire IP ranges are allowed for RTP traffic
- Confirm RTP ports are open bidirectionally
- Check if recent infrastructure changes affected non-Twilio/Telnyx providers
-
Review recent model updates
- What changed in
gpt-realtimealias between Oct 18-22, 2025? - Are there known issues with the current model version?
- Should we use
gpt-realtime-2025-08-28as a workaround?
- What changed in
-
Verify audio transcoding
- Is the SIP gateway properly transcoding PCM 24kHz β PCMU 8kHz?
- Are RTP packets being generated with actual audio data?
- Are there any logs on OpenAIβs side showing RTP transmission failures?
Information We Can Provide
- Specific call IDs with complete timing and logs
- Full SIP INVITE/SDP from SignalWire
- WebSocket message dumps showing audio generation
- Network traces if needed (with your guidance)
- Test calls on demand to reproduce issue
SWML Configuration (SignalWire Routing)
Non-Whitelisted Caller Flow (AI Screening)
{
"version": "1.0.0",
"sections": {
"main": [
{
"answer": {
"max_duration": 14400
}
},
{
"set": {
"parent_call_sid": "%{call.call_id}"
}
},
{
"set": {
"session_id": "call_%{vars.parent_call_sid}"
}
},
{
"play": {
"url": "say: Connecting to call screener."
}
},
{
"connect": {
"to": "sip:proj_XXXXXXXXXXXXXXXXXXXX@sip.api.openai.com;transport=tls;model=gpt-realtime-preview",
"from": "%{call.from}",
"timeout": 300,
"max_duration": 14400,
"session_timeout": 14400,
"answer_on_bridge": false,
"codecs": "PCMU,PCMA",
"headers": [
{
"name": "X-Parent-CallSid",
"value": "%{vars.parent_call_sid}"
},
{
"name": "X-Session-ID",
"value": "%{vars.session_id}"
},
{
"name": "X-Caller",
"value": "%{call.from}"
},
{
"name": "X-Called",
"value": "%{call.to}"
}
]
}
}
]
}
}
Call Flow Diagram
βββββββββββββββββββ
β Caller β
β +1-XXX-XXX-XXX β
ββββββββββ¬βββββββββ
β INVITE
βΌ
βββββββββββββββββββββββββββββββββββ
β SignalWire β
β - Receives call β
β - Generates SWML β
β - Checks whitelist β
ββββββββββ¬βββββββββββββββββββββββββ
β SIP INVITE (TLS)
β to: sip:proj_XXX@sip.api.openai.com
βΌ
βββββββββββββββββββββββββββββββββββ
β OpenAI SIP Gateway β
β - Sends webhook β
β - Waits for /accept β
ββββββββββ¬βββββββββββββββββββββββββ
β POST webhook
βΌ
βββββββββββββββββββββββββββββββββββ
β Our Voice API Server β
β - Verifies webhook β
β - Calls /accept endpoint β
β - Opens WebSocket monitor β
ββββββββββ¬βββββββββββββββββββββββββ
β POST /v1/realtime/calls/{id}/accept
β with audio config
βΌ
βββββββββββββββββββββββββββββββββββ
β OpenAI Realtime API β
β - Accepts call (200 OK) β
β - Establishes WebSocket β
β - β
Generates AI responses β
β - β
Sends audio events β
β - β RTP AUDIO SILENT β β οΈ PROBLEM HERE
βββββββββββββββββββββββββββββββββββ
β RTP (should contain audio)
βΌ
βββββββββββββββββββββββββββββββββββ
β SignalWire β
β - Bridges RTP β
β - β Receives silence β
ββββββββββ¬βββββββββββββββββββββββββ
β RTP (silent)
βΌ
βββββββββββββββββββ
β Caller β
β β HEARS NOTHINGβ
βββββββββββββββββββ
Expected Behavior
- Caller dials forwarding number β SignalWire
- SignalWire routes via SIP to OpenAI
- OpenAI Realtime API generates greeting: βHi, Iβm XX, the AI answering serviceβ¦β
- Caller should hear the greeting (currently silent)
- Conversation proceeds with AI screening caller
- Based on screening decision, AI either connects or rejects call
Actual Behavior
Steps 1-3 work perfectly, but step 4 fails - caller hears complete silence despite OpenAI generating the audio internally.
Additional Context
SignalWire IP Ranges (if needed for firewall rules)
OpenAI may need to whitelist SignalWireβs infrastructure. We can provide:
- Source IP addresses from our logs
- SignalWireβs published IP ranges
- Network traces showing RTP packet flow
Working Reference Implementation
We have a known-working state from October 18, 2025 at 3:03 PM UTC-4 with:
- Same code
- Same configuration
- Same audio settings
- Audio was audible to callers
The only changes since then:
- Added
model=gpt-realtime-previewparameter (required, resolved WebSocket 1006) - Minor SWML adjustments (tested with/without - no difference)
Questions for OpenAI
- Were there any changes to RTP firewall rules between Oct 18-22, 2025?
- Are SignalWire IP addresses whitelisted for RTP traffic?
- Were there any updates to the
gpt-realtimemodel in late October 2025? - Is there a known issue with SIP audio in the current production version?
- Should we use
gpt-realtime-2025-08-28as a temporary workaround? - Can you check RTP packet transmission logs for our call IDs?
- Are there any specific SDP requirements we might be missing?
- Why is OpenAI reporting 110 input audio tokens when the caller hasnβt spoken?
- Is this normal session setup overhead?
- Or is OpenAI receiving RTP input that weβre not sending?
- Could this indicate a bidirectional RTP media path issue?
- Can you verify RTP packet payloads contain actual audio data (not just headers)?
- Both incoming (caller β OpenAI) and outgoing (OpenAI β caller) directions
Contact Information
Reporter: [Redacted for privacy - available upon request to OpenAI support]
Project: AI Call Screening Service
Priority: Critical - Production service completely non-functional
Availability: Available for live debugging, test calls, or providing additional logs
Appendix: Full Example Log Sequence
1. Incoming Call Webhook
[VoiceAPI] ========== INCOMING OPENAI WEBHOOK ==========
[VoiceAPI] Event Type: realtime.call.incoming
[VoiceAPI] Call ID: rtc_a46f81d14638494996963cc04eabaa12
[VoiceAPI] From: +1347XXXXXXX
[VoiceAPI] To: +1773XXXXXXX
2. Accept Call (200 OK)
[OpenAI-SIP] π€ Sending /accept with audio configuration
[OpenAI-SIP] π Audio output voice: cedar
[OpenAI-SIP] π Request URL: https://api.openai.com/v1/realtime/calls/rtc_a46f81d14638494996963cc04eabaa12/accept
[OpenAI-SIP] β
Call rtc_a46f81d14638494996963cc04eabaa12 accepted by OpenAI - HTTP 200
3. WebSocket Connected
[OpenAI-SIP] β
WebSocket connected for call rtc_a46f81d14638494996963cc04eabaa12
[OpenAI-SIP] π― Session already configured in /accept - sending response.create to trigger greeting
4. AI Generates Audio (confirmed by events)
[OpenAI-SIP] π¨ Event: response.output_audio.done
[OpenAI-SIP] π¨ Event: response.output_audio_transcript.done
[OpenAI-SIP] π€ AI transcript: "Hi, I'm [Company Name], the AI answering service. I notice you're calling from a number I don't recognize. Could you please tell me who you are and why you're calling?"
[OpenAI-SIP] π€ Audio should be playing now on the SIP call - check if caller hears this!
5. Audio Token Confirmation
{
"usage": {
"output_token_details": {
"text_tokens": 52,
"audio_tokens": 173
}
}
}
Result: Audio generated (173 tokens), transcript logged, but caller heard nothing.
End of Report
This report documents a critical production issue affecting OpenAI Realtime API SIP integration. Multiple users are experiencing identical symptoms. Immediate engineering attention requested.