RealtimeAPI audio feedback

kisalit · January 24, 2025, 12:16pm

I’m using gpt-4o-realtime-review-2024-12-17 model to make a chat bot with both audio and text modalities. The problem is when it gives a response from the speaker, the bot listen itself and takes that response as an input and gives a response to what it said. Then it’s going on a loop. Looking for a solution.

louzell · January 24, 2025, 5:29pm

Are you building on the apple ecosystem by chance? If so, you need to look into setVoiceProcessingEnabled(_:) | Apple Developer Documentation

Let me know if you are an apple dev and I’ll share what I know (I ran into this myself, and it’s quite a pain to solve, especially if you are using the AVCaptureSession API)

kisalit · January 25, 2025, 7:20am

Thank you. But I’m not an apple dev. I’m using Linux

aza · January 25, 2025, 7:59pm

You need echo cancellation to prevent the output from your speaker being input to your microphone. This is an audio processing problem not anything to do with the Reamtime API (it can only work with the audio input it gets).

A quick solution is to use headphones/headset where the audio output is played through the headphones and therefore won’t get picked up by your microphone.

kisalit · January 26, 2025, 4:51am

Yes, But for my application using headphones is not practical.

jwcase · January 26, 2025, 8:04am

this maybe a pathologically stupid answer, and a kludge at that, but can you set a gate to take the robot’s voice as a sidechain and mute the mic while the thing is speaking?

aza · January 26, 2025, 10:05am

Muting the mic is a great suggestion. There are data channel audio transcript delta events so it it feasible to know when the AI is about to start talking.

The BIG downside is that you would lose the ability to interrupt when the AI is doing the wrong thing or waffling. If that’s not required then muting would be an option.

Echo cancellation is likely to still be the proper robuts approach. That’s how the browsers do it.

tyler10 · January 30, 2025, 3:51pm

Hey @louzell - I’m using iOS and have run into this issue before. We use AVAudioEngine rather than AVCaptureSession, but also had to setVoiceProcessingEnabled to true on both the input and output nodes.

We still run into occasional issues though. Can you share what you are doing? Curious to compare notes.

sashirestela · January 30, 2025, 3:56pm

@kisalit This is in Java, but I think you could take the recommendations and extrapolate them to your own context:

louzell · January 30, 2025, 9:02pm

Hey @tyler10! I put my notes up here. I didn’t want to get too Apple-specific in this thread since @kisalit is looking for a linux solution: Audio notes for OpenAI realtime on Apple platforms

I hope they help

Topic		Replies	Views
Sharing experiences about Realtime in the backend ☕ API java , realtime	0	1211	January 29, 2025
Background Noise Interfering with Realtime API Using Phone API realtime	14	3001	July 31, 2025
Issue with realtime api user interruption API realtime	6	1990	October 24, 2024
Realtime API starts to answer itself with mic+speaker setup API realtime	6	2636	November 22, 2024
Realtime API re-consuming it's own output audio as input audio API audio , realtime , api-realtime , api-realtime-speech	10	1033	January 10, 2025

RealtimeAPI audio feedback

Related topics