Audio Corruption in WebSocket Binary Data when using OpenAI Realtime API in Cloudflare Workers

I am discussing this issue on GitHub’s cloudflare/workerd https://github.com/cloudflare/workerd/issues/3981, and I was advised to contact OpenAI as well. Since I don’t know the official support contact, I decided to post it in the Developer Community for now.

I will also repost the report I wrote in the relevant GitHub issue here. I have removed various links because a warning appeared stating that links cannot be included in the post.

Issue Description

I’m experiencing audio quality issues when using OpenAI’s Realtime Speech-to-Speech API via WebSockets in a Cloudflare Workers environment. The audio output contains significant noise/distortion, making it unusable. Interestingly, the exact same API integration works perfectly in a Node.js environment using the standard ws package. I suspect that the issue is related to the Base64 encoded audio data coming from OpenAI, as the text version of the OpenAI Realtime API allows for correct communication of text messages on Cloudflare. In particular, the same audio problem occurs both in the local environment with wrangler dev and in the deployed environment.

My minimal reproduction code is on GitHub https://github.com/phasetr/pt-javascript/tree/main/2025-04-17-cf-simple-speech-to-speech. Main files are index.ts and index.node.ts.

Environment

  • Wrangler version: 4.12.0
  • Node.js version: 23.9.0
  • Hono version: 4.7.5
  • OpenAI API: Realtime Speech-to-Speech API (gpt-4o-realtime-preview-2024-10-01)
  • Twilio: using for voice chat.

Steps to Reproduce

  1. Set up a WebSocket endpoint in Cloudflare Workers using WebSocketPair
  2. Connect to OpenAI’s Realtime API using fetch with WebSocket upgrade
  3. Process audio data between the client and OpenAI
  4. Receive distorted/noisy audio in the response from OpenAI

Expected Behavior

Clean, noise-free audio should be transmitted through the WebSocket connections, as is the case when using the same API with Node.js and the ws package. My cloudflare sample file is index.ts, and my node.js version is index.node.ts. (My node.js version also properly works in AWS ECS environment.)

Actual Behavior

The audio received from OpenAI and forwarded to the client contains significant noise/distortion, making it unusable for speech applications.

Debugging Information

I’ve verified that the issue is specific to the Cloudflare Workers environment:

  1. Node.js implementation works perfectly: Using standard ws package with direct WebSocket connections (wss:// schema)
  2. Cloudflare Workers implementation has audio noise: Using WebSocketPair and fetch with WebSocket upgrade (https:// schema)
  3. Data verification: I’ve confirmed that the binary audio data received from OpenAI already contains noise when using the Cloudflare Workers implementation

Code Comparison

Cloudflare Workers Implementation (problematic)

The full code is as follows:

// WebSocket setup using WebSocketPair
const webSocketPair = new WebSocketPair();
const client = webSocketPair[0];
const server = webSocketPair[1];

// OpenAI connection using fetch with WebSocket upgrade
const response = await fetch(
  "https://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-10-01",
  {
    headers: {
      Authorization: `Bearer ${OPENAI_API_KEY}`,
      "OpenAI-Beta": "realtime=v1",
      Upgrade: "websocket",
      Connection: "Upgrade",
      "Sec-WebSocket-Version": "13",
      "Sec-WebSocket-Key": btoa(Math.random().toString(36).substring(2, 15)),
    },
  }
);

// @ts-ignore - Cloudflare Workers-specific API
const webSocket = response.webSocket;
// @ts-ignore
webSocket.accept();

// Processing binary data
webSocket.addEventListener("message", async (event: MessageEvent) => {
  const response = event.data instanceof ArrayBuffer
    ? JSON.parse(new TextDecoder().decode(event.data))
    : JSON.parse(event.data);

  if (response.type === "response.audio.delta" && response.delta) {
    // Forward audio data to client (contains noise)
    server.send(JSON.stringify({
      event: "media",
      streamSid: streamSid,
      media: { payload: response.delta },
    }));
  }
});

Node.js Implementation (working correctly)

The full code is as follows:

// Standard WebSocket setup
const wss = new WebSocketServer({
  server: nodeHttpServer,
  path: "/ws-voice",
});

wss.on("connection", async (connection: WebSocket) => {
  // Direct WebSocket connection to OpenAI
  const openAiWs = new WebSocket(
    "wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-10-01",
    {
      headers: {
        Authorization: `Bearer ${OPENAI_API_KEY}`,
        "OpenAI-Beta": "realtime=v1",
        // Other headers...
      },
    }
  );

  // Processing binary data
  openAiWs.on("message", async (data: WebSocket.Data) => {
    const response = JSON.parse(data.toString());
    if (response.type === "response.audio.delta" && response.delta) {
      // Forward audio data to client (works perfectly)
      connection.send(JSON.stringify({
        event: "media",
        streamSid: streamSid,
        media: { payload: response.delta },
      }));
    }
  });
});

Possible Causes

I suspect the issue might be related to one of the following:

  1. Binary data handling in Cloudflare Workers’ WebSocketPair implementation
  2. The way fetch API with WebSocket upgrade processes binary data
  3. Potential encoding/decoding issues with TextDecoder in the Cloudflare Workers environment
  4. Possible incompatibility between OpenAI’s Realtime API and Cloudflare Workers’ WebSocket implementation

Additional Notes

  • The issue is consistent and reproducible
  • No errors are logged in the console
  • The almost the same code, I think, works perfectly in Node.js environment
  • The issue specifically affects audio quality, not the connection itself

Any assistance in resolving this issue would be greatly appreciated, as it’s blocking our ability to use OpenAI’s Realtime API in Cloudflare Workers.