Summary
Using Realtime API (SIP) with webhook accept and WebSocket attach.
On WS: session.update + response.create succeed; TTS/transcript stream normally.
FreeSWITCH (FS) sends SRTP to OpenAI’s advertised media address, but OpenAI → FS RTP is 0 packets to the FS-advertised IP:port.
Reproduces on two independent public hosts.
Environment
PBX: FreeSWITCH 1.10.12 (host networking, no NAT).
Codec: PCMA/8000 (G.711 A-law), ptime 20 ms, telephone-event 101.
SRTP: SDES, AES_CM_128_HMAC_SHA1_80 negotiated and activated.
Realtime flow: webhook POST /v1/realtime/calls/{call_id}/accept → WSS wss://api.openai.com/v1/realtime?call_id=…; session.update (g711_alaw in/out) → response.create (greeting).
Repro (anonymized)
Local SDP (FS): c=IN IP4 <FS_PUBLIC_IP> m=audio <RTP_LOCAL_PORT> RTP/SAVP 8 101 a=crypto:7 AES_CM_128_HMAC_SHA1_80 inline:… a=ptime:20
Remote SDP (OpenAI): c=IN IP4 <OAI_MEDIA_IP> m=audio <OAI_RTP_PORT> RTP/SAVP 8 101 a=crypto:7 AES_CM_128_HMAC_SHA1_80 inline:… a=ptime:20
WS events (excerpt): session.updated → response.created → output_audio_buffer.started → transcript (“Thank you for calling…”) → response.done.
Media observations
FS → OpenAI: hundreds of SRTP packets sent (<FS_PUBLIC_IP>:<RTP_LOCAL_PORT> → <OAI_MEDIA_IP>:<OAI_RTP_PORT>).
OpenAI → FS: zero packets to <FS_PUBLIC_IP>:<RTP_LOCAL_PORT> (tcpdump).
UDP test traffic from the public Internet to the exact FS RTP port arrives (nping), so the host is reachable.
What we ruled out
Firewall: INPUT ACCEPT; no nftables drops; full RTP range open; rp_filter=0 on all interfaces.
NAT/routing: none (host networking).
SIP: 200/ACK OK; FS logs “Correct audio ip/port confirmed.”
Alternate host/network: identical result on a second, unrelated server.
A/B: also tried PCMU and proxy_media=false → no change.
Questions to the community/devs
Is symmetric-RTP “kickstart” required? We transmit outbound SRTP, yet inbound never starts.
Is a=rtcp-mux or any additional SDP attribute required?
Any known regional/egress constraints for Realtime SIP media that could cause this symptom?
Can anyone share a known-good PCMA/SRTP Realtime SIP example (SDP snippets welcome)?
Available artifacts (can share anonymized via DM)
FS signaling/media logs (INVITE → 200 → ACK, SRTP activation).
Webhook + WS logs (session.update, response.create, output_audio_buffer.started, transcript).
Full Local/Remote SDP (anonymized).
pcaps: (1) outbound SRTP present; (2) inbound SRTP missing.
Any pointers or a working reference config would be greatly appreciated.