Issue with realtime api user interruption

I tried to implement a simple realtime api demo with user interruption feature (i.e., like in the app advanced voice mode, user can speak to interrupt gpt4o’s ongoing audio output at anytime). However, the issue is my computer’s microphone will also capture the API’s output voice as take it as user input, and as the result api will constantly interrupt itself. I wonder how the advanced voice mode solve this issue.

6 Likes

Hey there!

I am having the same issue. I ported the real-time API over to python (since my whole code before was in Python, not JS). If you are using web technologies like JS, you could use webrtc for AEC (acoustic echo cancellation). This is harder to do in python. If any one figures out how to handle this, please comment.

1 Like

Hm. The OpenAI demo application as well their Playground Console both run in the browser and have the same issue for me. Maybe it depends on the machine and the browser, but so far I had this issue in Chrome and Safari. I’m on an M3 MacBook Pro which should have a decent mic array.

2 Likes

Just tried on their playground. No such issue for me there. The issue only arises for my own implementation.

1 Like

Yeah, for me it’s the same. In the playground it works perfectly. As I said they most likely use some form of AEC for their web implementation (playground and also ChatGPT App). In the API we don’t have much control over this, we can only set the audio level threshold, which might help in some cases. For most cases however, I suppose we’ll have to implement some sort of AEC on the client side ourselves.

1 Like

Also having the same issue, where the device microphone picks up the output from OpenAI and it keep interrupting itself.
Wondering if this is solved, or basically can’t use the realtime without headphones.

Yeah it’s not implemented out of the box in the API and I’m not sure if it even can be. What I did to solve the issue is moving the audio recording and playback to the frontend i.e the browser. Before I had all the audio handling in my Python backend since I’m working on a local desktop application. Then you can use AEC with WebRTC for example, which most modern browsers support.