Realtime "modalities" session config not disabling local->model audio channel

I know many of us are trying out the realtime audio API + models – my question is specific so I’ll try keep it on point – please presume my tech stack matches this openai example: GitHub - openai/openai-realtime-console: React app for inspecting, building and debugging with the Realtime API, I’m otherwise not doing anything special and have tried various obvious things…

I want to be able to enable and disable audio during a session so I create and update the session with the presence or absence of “audio” in the modalities field (see attached).

Problem is that only seems to effect the outgoing channel from openai model to my local and out speaker. The inbound channel from my mic to model remains open for audio regardless of the “modalities” setting. So I can’t stop the model from listening to audio.

I know I might be able to play with webrtc settings but for moment want to keep it simple like the example but need to ask the group about the “modalities” field – as you see from attached the docs don’t address unidirectional vs. bidirectional audio, is there a way using session update to disable audio bidirectionally?

Thanks
-J

The only thing you have remote control of:

        "turn_detection": {
            "type": "server_vad",
            "threshold": 0.5,
            "prefix_padding_ms": 300,
            "silence_duration_ms": 500,
            "create_response": true
        },

Set threshold to 1.0 and they have to talk louder to have the stream of audio considered as input.

“type”: “none” is going to get you continued collection of audio until you trigger and it all gets dumped into an AI.

“create_response”, whether or not to automatically generate a response when a VAD stop event occurs, also might do the same, an audio collection just waiting for a “create” or another reason to go off on everything it has collected, but if you are lucky, will be your remote control “discard audio”.

Therefore, the option you have is client control, a client which ultimately with WebRTC has your secret and is out of your control - even if you do give them a “mute mic” button (or your own under remote control).

1 Like

excellent, was going to look into those so glad u laid them out.

I might also just try to do this at the websocket level, if I can disable the upbound audio channel then I don’t have to worry about getting the API just right (+ it staying so), also might have the beneficial side effect of mic icon on browser signaling mute. https://stackoverflow.com/questions/35512314/how-to-mute-unmute-mic-in-webrtc

thx a bunch

To close this out:

The ‘audio’ value in modalities of session impacts the outbound audio channel from model → user only – it doesn’t mute or stop audio from user → model.

WebRTC has a mechanism to mute the channel but I could not get this to work correctly (I’m not new to WebRTC though so I don’t think I missed anything simple). I suspect I can’t mute webrtc because of the VAD (turn detection) settings (see attached google snippet) … there remains some mystery here but my time invest is up for present.

I went with disabling the turn_detection to achieve the effect of muting local mic, but rather than adjust the values within the turn detection block as @_j suggested I set the turn_detection key to null (see attached openai doc snippet).

   ...
   turn_detection: null,
   ...

thx

1 Like