Can't Hear Inbound Audio from OpenAI Realtime Agent (WebRTC) - Oubound Works, Inbound Stuck at 1 kbps

Hi all,

I’m running into a frustrating issue with the OpenAI Realtime API over WebRTC for a voice agent project, and would love any help, pointers, or confirmation from anyone who’s gotten full audio round-trip working.

Setup Overview

  • Client: React Native iOS app using react-native-webrtc and react-native-incall-manager.

  • Signaling: Custom Node.js token server that relays SDP offers/answers between the client and OpenAI’s /v1/realtime/calls endpoint.

  • Session config:

    {
      "type": "realtime",
      "model": "gpt-4o-realtime-preview-2025-08-28",
      "output_modalities": ["audio"],
      "audio": { "output": { "voice": "marin" } },
      "instructions": "As soon as the call begins, greet the user and say: 'This is a test. Please respond.'"
    }
    
    
  • TURN: I provide a TURN server in the ICE config, but the same issue occurs with just Google’s STUN.

  • SDP Offer/Answer: Confirmed to be negotiated with Opus, 48kHz, sendrecv.

  • ICE/DTLS: Connection state goes to connected and completed.

What Works

  • Outbound Audio: I can see outbound audio (bytesSent, kbps) reported by getStats(), and the OpenAI API returns an SDP answer without error.

  • Remote Track: The ontrack event fires, a remote audio MediaStream is attached, remoteStream.getAudioTracks().length > 0, track is live, not muted.

  • Audio Routing: All iOS/AVAudioSession and InCallManager calls succeed, audio is routed to the speaker.

  • SDP Logging: Full offer/answer is logged and looks valid (happy to provide snippets).

What Does NOT Work

  • Inbound Audio:

    • getStats() always shows inbound audio stuck at ~1 kbps (never rises above this).

    • I do not hear any agent speech (should get “This is a test. Please respond.”).

    • The remote audio track appears attached and enabled, but no sound is heard.

  • No OpenAI Usage: The OpenAI API dashboard shows zero tokens used for these requests, which suggests it’s not hearing anything it can process/respond to, or is not sending any audio.

Other Troubleshooting Performed

  • Tried with/without custom TURN, using Google STUN only.

  • Tried multiple networks (WiFi, LTE, different NATs).

  • Checked that my SDP offers Opus, sendrecv, etc. (full logs available).

  • Confirmed remote audio track is attached and not muted.

  • Outbound stats show audio flowing (up to 30 kbps+).

  • InCallManager logs show proper audio session setup.

  • The OpenAI /session endpoint is reachable and returns 201 and a valid SDP answer.

What I Suspect

  • SDP/ICE negotiation issue? But connection states are “connected” and “completed”.

  • Firewall/NAT blocking inbound UDP? But should be covered by TURN and tested on permissive networks.

  • OpenAI agent not sending audio because it never detects a turn? But I am sending audio, and tracks show enabled.

  • Something missing in my session config to force a response from the agent?

  • Or… some subtle iOS audio or WebRTC API edge case?

Questions / Requests for Help

  1. Has anyone gotten inbound agent audio working over WebRTC (not WebSocket) on iOS?

  2. Are there any OpenAI-side diagnostics/logs I can request to check if my media is being received/processed?

  3. Is there a sample working sessionConfig and SDP exchange for a successful iOS-to-OpenAI audio call?

  4. Anything else I should check on the client or signaling side that could block inbound audio?

Thanks for any ideas or reports!
Happy to provide code snippets, logs, SDP, etc.

Try this repo for a working iOS demo: GitHub - PallavAg/VoiceModeWebRTCSwift: OpenAI Swift Realtime API with WebRTC

Thank you, checking it out now.