Realtime "modalities" session config not disabling local->model audio channel

johnroy · June 5, 2025, 6:50pm

I know many of us are trying out the realtime audio API + models – my question is specific so I’ll try keep it on point – please presume my tech stack matches this openai example: GitHub - openai/openai-realtime-console: React app for inspecting, building and debugging with the Realtime API, I’m otherwise not doing anything special and have tried various obvious things…

I want to be able to enable and disable audio during a session so I create and update the session with the presence or absence of “audio” in the modalities field (see attached).

Problem is that only seems to effect the outgoing channel from openai model to my local and out speaker. The inbound channel from my mic to model remains open for audio regardless of the “modalities” setting. So I can’t stop the model from listening to audio.

I know I might be able to play with webrtc settings but for moment want to keep it simple like the example but need to ask the group about the “modalities” field – as you see from attached the docs don’t address unidirectional vs. bidirectional audio, is there a way using session update to disable audio bidirectionally?

Thanks
-J

_j · June 6, 2025, 12:36am

The only thing you have remote control of:

        "turn_detection": {
            "type": "server_vad",
            "threshold": 0.5,
            "prefix_padding_ms": 300,
            "silence_duration_ms": 500,
            "create_response": true
        },

Set threshold to 1.0 and they have to talk louder to have the stream of audio considered as input.

“type”: “none” is going to get you continued collection of audio until you trigger and it all gets dumped into an AI.

“create_response”, whether or not to automatically generate a response when a VAD stop event occurs, also might do the same, an audio collection just waiting for a “create” or another reason to go off on everything it has collected, but if you are lucky, will be your remote control “discard audio”.

Therefore, the option you have is client control, a client which ultimately with WebRTC has your secret and is out of your control - even if you do give them a “mute mic” button (or your own under remote control).

johnroy · June 6, 2025, 1:33am

excellent, was going to look into those so glad u laid them out.

I might also just try to do this at the websocket level, if I can disable the upbound audio channel then I don’t have to worry about getting the API just right (+ it staying so), also might have the beneficial side effect of mic icon on browser signaling mute. https://stackoverflow.com/questions/35512314/how-to-mute-unmute-mic-in-webrtc

thx a bunch

johnroy · June 6, 2025, 5:50pm

To close this out:

The ‘audio’ value in modalities of session impacts the outbound audio channel from model → user only – it doesn’t mute or stop audio from user → model.

WebRTC has a mechanism to mute the channel but I could not get this to work correctly (I’m not new to WebRTC though so I don’t think I missed anything simple). I suspect I can’t mute webrtc because of the VAD (turn detection) settings (see attached google snippet) … there remains some mystery here but my time invest is up for present.

I went with disabling the turn_detection to achieve the effect of muting local mic, but rather than adjust the values within the turn detection block as @_j suggested I set the turn_detection key to null (see attached openai doc snippet).

   ...
   turn_detection: null,
   ...

thx

Topic		Replies	Views
Realtime API - No response audio or audio deltas, despite modalities being set to ['audio', 'text'] Bugs api	2	1627	July 14, 2025
Correct me if I'm wrong, but I should be able to turn off server_vad... right? Bugs api-realtime	1	257	September 21, 2025
Realtime API: Updating Modalities API voice , advanced-voice , realtime , api-realtime-speech	13	2905	July 8, 2025
Realtime API WebRTC sudden failed connections Bugs realtime , api-realtime	9	933	March 25, 2025
Realtime API re-consuming it's own output audio as input audio API audio , realtime , api-realtime , api-realtime-speech	10	1334	January 10, 2025

Realtime "modalities" session config not disabling local->model audio channel

Related topics