Hi all,
I’m running into a frustrating issue with the OpenAI Realtime API over WebRTC for a voice agent project, and would love any help, pointers, or confirmation from anyone who’s gotten full audio round-trip working.
Setup Overview
-
Client: React Native iOS app using
react-native-webrtcandreact-native-incall-manager. -
Signaling: Custom Node.js token server that relays SDP offers/answers between the client and OpenAI’s
/v1/realtime/callsendpoint. -
Session config:
{ "type": "realtime", "model": "gpt-4o-realtime-preview-2025-08-28", "output_modalities": ["audio"], "audio": { "output": { "voice": "marin" } }, "instructions": "As soon as the call begins, greet the user and say: 'This is a test. Please respond.'" } -
TURN: I provide a TURN server in the ICE config, but the same issue occurs with just Google’s STUN.
-
SDP Offer/Answer: Confirmed to be negotiated with Opus, 48kHz, sendrecv.
-
ICE/DTLS: Connection state goes to connected and completed.
What Works
-
Outbound Audio: I can see outbound audio (
bytesSent,kbps) reported bygetStats(), and the OpenAI API returns an SDP answer without error. -
Remote Track: The
ontrackevent fires, a remote audio MediaStream is attached,remoteStream.getAudioTracks().length > 0, track is live, not muted. -
Audio Routing: All iOS/AVAudioSession and InCallManager calls succeed, audio is routed to the speaker.
-
SDP Logging: Full offer/answer is logged and looks valid (happy to provide snippets).
What Does NOT Work
-
Inbound Audio:
-
getStats()always shows inbound audio stuck at ~1 kbps (never rises above this). -
I do not hear any agent speech (should get “This is a test. Please respond.”).
-
The remote audio track appears attached and enabled, but no sound is heard.
-
-
No OpenAI Usage: The OpenAI API dashboard shows zero tokens used for these requests, which suggests it’s not hearing anything it can process/respond to, or is not sending any audio.
Other Troubleshooting Performed
-
Tried with/without custom TURN, using Google STUN only.
-
Tried multiple networks (WiFi, LTE, different NATs).
-
Checked that my SDP offers Opus, sendrecv, etc. (full logs available).
-
Confirmed remote audio track is attached and not muted.
-
Outbound stats show audio flowing (up to 30 kbps+).
-
InCallManager logs show proper audio session setup.
-
The OpenAI
/sessionendpoint is reachable and returns 201 and a valid SDP answer.
What I Suspect
-
SDP/ICE negotiation issue? But connection states are “connected” and “completed”.
-
Firewall/NAT blocking inbound UDP? But should be covered by TURN and tested on permissive networks.
-
OpenAI agent not sending audio because it never detects a turn? But I am sending audio, and tracks show enabled.
-
Something missing in my session config to force a response from the agent?
-
Or… some subtle iOS audio or WebRTC API edge case?
Questions / Requests for Help
-
Has anyone gotten inbound agent audio working over WebRTC (not WebSocket) on iOS?
-
Are there any OpenAI-side diagnostics/logs I can request to check if my media is being received/processed?
-
Is there a sample working sessionConfig and SDP exchange for a successful iOS-to-OpenAI audio call?
-
Anything else I should check on the client or signaling side that could block inbound audio?
Thanks for any ideas or reports!
Happy to provide code snippets, logs, SDP, etc.