Buzzing/Static Noise from gpt-realtime (WebRTC)

Bug Report: Buzzing/Static Noise from gpt-realtime (WebRTC)

Summary:
When using the new gpt-realtime model with WebRTC integration, I often hear a buzzing/static sound from the speakers while my microphone is open but I am not speaking. The buzzing lasts until I start speaking again. This occurs even though no response.audio.delta frames or actual assistant speech are being received.


Steps to Reproduce:

  1. Connect to the gpt-realtime model via WebRTC as per documentation.

  2. Keep the microphone open.

  3. Do not speak for a few seconds.

  4. Observe that instead of silence, a buzzing/static noise plays through the speaker.

  5. Once I speak again, the buzzing stops.


Expected Behavior:

  • No audio should be emitted when the assistant is not sending response.audio.delta frames.

  • Silence should be maintained until the assistant generates actual audio.


Observed Behavior:

  • Continuous buzzing/static sound from the speaker while idle.

  • Buzz stops immediately when user starts speaking.

  • No assistant speech or audio packets are being processed during the buzz.


Environment:

  • Model: gpt-realtime

  • Integration: WebRTC (browser client, standard sample rate PCM)

  • Audio I/O: Mic + speakers (headset)

  • Frequency: Happens intermittently but often enough to reproduce within minutes.


Possible Causes (based on community discussions):

  • The API sending silent or malformed response.audio.delta packets.

  • Comfort noise (CN) or DTX frames in WebRTC being rendered as buzz.

  • Acoustic loop from mic and speaker causing the stream to stay active.

  • Sample-rate mismatch in audio frames.


Workarounds Tried:

  • Verified no response.audio.delta events in logs during buzzing.

  • Using headphones reduces occurrence (suggests feedback may be a factor).


Request:
Please confirm if this is a known issue with the Realtime API/WebRTC pipeline, and whether there’s a fix or configuration (e.g., gating remote audio track until response.audio.delta is received) to prevent the buzzing sound.

2 Likes

Similar here, using websockets.

Works perfectly with header "OpenAI-Beta": "realtime=v1"

Removing "OpenAI-Beta": "realtime=v1" is just static noise once answered.

Connecting via Twilio. Endpoint wss://api.openai.com/v1/realtime?model=gpt-realtime

@jandieg.b were you able to resolve this issue? Can you please elaborate more how you fixed the issue? We are using “gpt-realtime-2025-08-28” and we are facing this buzzing sound issue.

We are simply calling it like below:
openai_ws_ref = await session_ref.ws_connect(
openai_ws_url,
heartbeat=30
)

Not sorted. Only way to not get the static noise is by using the beta (not GA)

Hello,

We are using gpt-realtime-2025-08-28 for generating both text and audio responses in our chatbot application. We recently observed an intermittent issue where the frontend audio playback produces a buzzing sound instead of clean speech.

From our investigation:

  • This happened when there was a large time gap (~40s) between two audio chunks generated by the model.

  • During this gap, the frontend had already started playback of the first chunk, and when the next chunk eventually arrived, the buzzing sound occurred.

  • The issue has been observed intermittently and is not consistent.

  • Our backend logs do not show any errors — only a long delay between chunks.

  • The buzzing sound was noticeable only on the application frontend, not logged in backend.

Questions:

  1. Could this delay in chunk generation be expected behavior from the realtime model?

  2. Has anyone else experienced audio distortion (buzzing/static) when using gpt-realtime models with delayed chunks? We have seen one similar question from other user as well who was probably using beta version, but we are not using beta.

  3. Are there recommended best practices on how the frontend should handle long gaps in audio streaming to avoid this issue?

Any insights from the community or OpenAI team would be appreciated.

Thanks!

sounds like the audio format doesn’t match what you’re expecting. Note that the JSON to control the audio format (if you’re not using the defaults) changed in the GA interface.

Could you please let me know the link to this JSON format?

see https://platform.openai.com/docs/api-reference/realtime-sessions/create-realtime-client-secret

"audio": {
      "input": {
        "format": {
          "type": "audio/pcm",
          "rate": 24000
        },
        "transcription": null,
        "noise_reduction": null,
        "turn_detection": {
          "type": "server_vad",
        }
      },
      "output": {
        "format": {
          "type": "audio/pcm",
          "rate": 24000
        },
        "voice": "alloy",
        "speed": 1.0
      }
    },
2 Likes

Thanks. I am sharing a few enhancements that would help us a lot:

  1. At the end of session, we should be able to request the entire transcript. The transcript should have option to “with interruption” and “without interruption” version.
  2. We really really need some HD voices. Gemini has done very well in this area.
  3. I have asked this question multiple times and I have been told it’s possible but not sure how: on Twilio we want the bot stream to signal that it has reached end of utterances. This is needed to unmute the caller after the entire disclaimer has been read, end call, after the bot has said “goodbye” etc. If this already exists, please share the documentation/repo.
  1. good suggestion
  2. can you explain what you mean? cedar and marin should be quite high quality, not sure what you mean by HD in this case though.
  3. you just need to wait for output_audio_buffer.stopped, see observer.ts in hello-realtime | Val Town