Realtime SIP: No inbound RTP from OpenAI media while WS TTS runs (PCMA/SRTP) – clean SDP, reproducible on 2 hosts

Summary

Using Realtime API (SIP) with webhook accept and WebSocket attach.
On WS: session.update + response.create succeed; TTS/transcript stream normally.
FreeSWITCH (FS) sends SRTP to OpenAI’s advertised media address, but OpenAI → FS RTP is 0 packets to the FS-advertised IP:port.
Reproduces on two independent public hosts.
Environment

PBX: FreeSWITCH 1.10.12 (host networking, no NAT).
Codec: PCMA/8000 (G.711 A-law), ptime 20 ms, telephone-event 101.
SRTP: SDES, AES_CM_128_HMAC_SHA1_80 negotiated and activated.
Realtime flow: webhook POST /v1/realtime/calls/{call_id}/accept → WSS wss://api.openai.com/v1/realtime?call_id=…; session.update (g711_alaw in/out) → response.create (greeting).
Repro (anonymized)

Local SDP (FS): c=IN IP4 <FS_PUBLIC_IP> m=audio <RTP_LOCAL_PORT> RTP/SAVP 8 101 a=crypto:7 AES_CM_128_HMAC_SHA1_80 inline:… a=ptime:20
Remote SDP (OpenAI): c=IN IP4 <OAI_MEDIA_IP> m=audio <OAI_RTP_PORT> RTP/SAVP 8 101 a=crypto:7 AES_CM_128_HMAC_SHA1_80 inline:… a=ptime:20
WS events (excerpt): session.updated → response.created → output_audio_buffer.started → transcript (“Thank you for calling…”) → response.done.
Media observations

FS → OpenAI: hundreds of SRTP packets sent (<FS_PUBLIC_IP>:<RTP_LOCAL_PORT> → <OAI_MEDIA_IP>:<OAI_RTP_PORT>).
OpenAI → FS: zero packets to <FS_PUBLIC_IP>:<RTP_LOCAL_PORT> (tcpdump).
UDP test traffic from the public Internet to the exact FS RTP port arrives (nping), so the host is reachable.
What we ruled out

Firewall: INPUT ACCEPT; no nftables drops; full RTP range open; rp_filter=0 on all interfaces.
NAT/routing: none (host networking).
SIP: 200/ACK OK; FS logs “Correct audio ip/port confirmed.”
Alternate host/network: identical result on a second, unrelated server.
A/B: also tried PCMU and proxy_media=false → no change.
Questions to the community/devs

Is symmetric-RTP “kickstart” required? We transmit outbound SRTP, yet inbound never starts.
Is a=rtcp-mux or any additional SDP attribute required?
Any known regional/egress constraints for Realtime SIP media that could cause this symptom?
Can anyone share a known-good PCMA/SRTP Realtime SIP example (SDP snippets welcome)?
Available artifacts (can share anonymized via DM)

FS signaling/media logs (INVITE → 200 → ACK, SRTP activation).
Webhook + WS logs (session.update, response.create, output_audio_buffer.started, transcript).
Full Local/Remote SDP (anonymized).
pcaps: (1) outbound SRTP present; (2) inbound SRTP missing.
Any pointers or a working reference config would be greatly appreciated.

Have you taken a look at the cookbook?

1 Like

Of course it is. There is not a single word about SIP.

I also confirm, this is an issue on our end. We basically get a 100 trying and no sdp in response to our offer. Tried G711U and G711A, with and without SRTP.

Regretfully and hoping this is not going to damn the topic by just throwing blames around or wiggling out of finding constructive solutions, my setup is also with FreeSWITCH.

For the dubious among us, I also tried with a desktop SIP UA, to no avail.

The only thing that pops to my mind is the contact header being the sdp return address, and that possibly creating an issue.

Would be good to have freaking normal SIP implementations and not à la Zoom, à la Microsoft, à la WebEX and co…

Try this in your external SIP profile:

param name=“enable-soa” value=“true”/

Just got it working.

From the documentation: Set the value to “false” to diable SIP SOA from sofia to tell sofia not to touch the exchange of SDP

If it screws up your regular flow, create a new profile for OpenAI.

1 Like

This is fixed now! Was a mistake on my side Realtime SIP & NAT no audio sent to client - #6 by Sean-Der

thanks for using it, and fire any questions my way excited to see people using it :slight_smile:

1 Like

And here I was thinking I’d become a magician! Thanks for the fix :slight_smile:

1 Like

Thank you very much for the quick assistance. I can confirm that the SIP connection works perfectly. The audio quality is flawless, and the model’s response time is impressive.