Audio Corruption in WebSocket Binary Data when using OpenAI Realtime API in Cloudflare Workers

dev6 · April 25, 2025, 6:32am

I am discussing this issue on GitHub’s cloudflare/workerd https://github.com/cloudflare/workerd/issues/3981, and I was advised to contact OpenAI as well. Since I don’t know the official support contact, I decided to post it in the Developer Community for now.

I will also repost the report I wrote in the relevant GitHub issue here. I have removed various links because a warning appeared stating that links cannot be included in the post.

Issue Description

I’m experiencing audio quality issues when using OpenAI’s Realtime Speech-to-Speech API via WebSockets in a Cloudflare Workers environment. The audio output contains significant noise/distortion, making it unusable. Interestingly, the exact same API integration works perfectly in a Node.js environment using the standard ws package. I suspect that the issue is related to the Base64 encoded audio data coming from OpenAI, as the text version of the OpenAI Realtime API allows for correct communication of text messages on Cloudflare. In particular, the same audio problem occurs both in the local environment with wrangler dev and in the deployed environment.

My minimal reproduction code is on GitHub https://github.com/phasetr/pt-javascript/tree/main/2025-04-17-cf-simple-speech-to-speech. Main files are index.ts and index.node.ts.

Environment

Wrangler version: 4.12.0
Node.js version: 23.9.0
Hono version: 4.7.5
OpenAI API: Realtime Speech-to-Speech API (gpt-4o-realtime-preview-2024-10-01)
Twilio: using for voice chat.

Steps to Reproduce

Set up a WebSocket endpoint in Cloudflare Workers using WebSocketPair
Connect to OpenAI’s Realtime API using fetch with WebSocket upgrade
Process audio data between the client and OpenAI
Receive distorted/noisy audio in the response from OpenAI

Expected Behavior

Clean, noise-free audio should be transmitted through the WebSocket connections, as is the case when using the same API with Node.js and the ws package. My cloudflare sample file is index.ts, and my node.js version is index.node.ts. (My node.js version also properly works in AWS ECS environment.)

Actual Behavior

The audio received from OpenAI and forwarded to the client contains significant noise/distortion, making it unusable for speech applications.

Debugging Information

I’ve verified that the issue is specific to the Cloudflare Workers environment:

Node.js implementation works perfectly: Using standard ws package with direct WebSocket connections (wss:// schema)
Cloudflare Workers implementation has audio noise: Using WebSocketPair and fetch with WebSocket upgrade (https:// schema)
Data verification: I’ve confirmed that the binary audio data received from OpenAI already contains noise when using the Cloudflare Workers implementation

Code Comparison

Cloudflare Workers Implementation (problematic)

The full code is as follows:

// WebSocket setup using WebSocketPair
const webSocketPair = new WebSocketPair();
const client = webSocketPair[0];
const server = webSocketPair[1];

// OpenAI connection using fetch with WebSocket upgrade
const response = await fetch(
  "https://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-10-01",
  {
    headers: {
      Authorization: `Bearer ${OPENAI_API_KEY}`,
      "OpenAI-Beta": "realtime=v1",
      Upgrade: "websocket",
      Connection: "Upgrade",
      "Sec-WebSocket-Version": "13",
      "Sec-WebSocket-Key": btoa(Math.random().toString(36).substring(2, 15)),
    },
  }
);

// @ts-ignore - Cloudflare Workers-specific API
const webSocket = response.webSocket;
// @ts-ignore
webSocket.accept();

// Processing binary data
webSocket.addEventListener("message", async (event: MessageEvent) => {
  const response = event.data instanceof ArrayBuffer
    ? JSON.parse(new TextDecoder().decode(event.data))
    : JSON.parse(event.data);

  if (response.type === "response.audio.delta" && response.delta) {
    // Forward audio data to client (contains noise)
    server.send(JSON.stringify({
      event: "media",
      streamSid: streamSid,
      media: { payload: response.delta },
    }));
  }
});

Node.js Implementation (working correctly)