WebRTC transcription guide seems to be broken

osv · March 21, 2025, 7:01am

https://platform.openai.com/docs/guides/realtime?use-case=transcription#connection-details

The code to obtain a session token contains:

body: JSON.stringify({
model: “gpt-4o-transcribe”,
}),

which fails.

The client code points to:

const baseUrl = “https://api.openai.com/v1/realtime/transcription_sessions”;

which is probably wrong??

Even when I remove the model from session code and point to https://api.openai.com/v1/realtime instead, I get “400 Bad Request” back.

Has anybody made WebRTC transcription work - and how?

dkundel · March 21, 2025, 9:14pm

Hey @osv. Sorry for the confusion. The new transcription mode only works with Websockets at the moment. We adjusted the docs accordingly.

innocent.akhidenor · March 24, 2025, 7:02am

Any timeline for Webrtc support?

osv · March 24, 2025, 7:41am

I do not know if it helps but this websocket implementation seems to work for me:

import { getToken } from './backend.r.js'

async function getWS() {
  const EPHEMERAL_KEY = await getToken()

  const ws = new WebSocket(
    'wss://api.openai.com/v1/realtime?intent=transcription',
    [
      'realtime',
      // Auth
      'openai-insecure-api-key.' + EPHEMERAL_KEY,
      // Optional
      'openai-organization.' + 'org-xxx,
      'openai-project.' + 'proj_xxx,
      // Beta protocol, required
      'openai-beta.realtime-v1',
    ],
  )

  ws.addEventListener('error', (error) => {
    console.error('WebSocket error:', error)
  })

  ws.addEventListener('message', (evt) => {
    console.log(evt.data)

    if (typeof evt.data !== 'string') return

    const deltaType = 'conversation.item.input_audio_transcription.delta'
    const isDelta = evt.data.includes(deltaType)
    if (!isDelta) return

    const data = JSON.parse(evt.data)
    if (data.type !== deltaType) return

    document.body.textContent += data.delta
  })

  await new Promise((fn) => ws.addEventListener('open', fn))

  ws.send(
    JSON.stringify({
      type: 'transcription_session.update',
      session: {
        input_audio_transcription: {
          model: 'gpt-4o-transcribe',
        },
      },
    }),
  )

  return ws
}

const audioWorkletProcessorCode = `
class PCMProcessor extends AudioWorkletProcessor {
  constructor() {
    super();
    this.sampleRate = 24000; // 24kHz sample rate
    this.chunkSize = this.sampleRate * 0.1; // 100ms worth of samples (2400 samples)
    this.buffer = []; // Buffer to accumulate audio samples
  }

  process(inputs, outputs, parameters) {
    const input = inputs[0];
    if (input && input[0]) {
      const float32Data = input[0];

      // Accumulate samples in the buffer
      this.buffer.push(...float32Data);

      // When the buffer reaches the chunk size, process and send
      while (this.buffer.length >= this.chunkSize) {
        const chunk = this.buffer.slice(0, this.chunkSize); // Take 100ms worth of samples
        this.buffer = this.buffer.slice(this.chunkSize); // Remove processed samples from the buffer

        // Convert Float32 to Int16
        const int16Buffer = new Int16Array(chunk.length);
        for (let i = 0; i < chunk.length; i++) {
          int16Buffer[i] = Math.max(-1, Math.min(1, chunk[i])) * 0x7fff;
        }

        // Post to the main thread
        this.port.postMessage(int16Buffer.buffer, [int16Buffer.buffer]);
      }
    }

    return true; // Keep the processor alive
  }
}

registerProcessor('pcm-processor', PCMProcessor);
`

export async function main() {
  const audioEl = document.createElement('audio')
  audioEl.autoplay = true

  const stream = await navigator.mediaDevices.getUserMedia({
    audio: { sampleRate: 24000, channelCount: 1 },
  })

  const audioContext = new AudioContext({ sampleRate: 24000 })

  const blob = new Blob([audioWorkletProcessorCode], {
    type: 'application/javascript',
  })

  const workletURL = URL.createObjectURL(blob)
  await audioContext.audioWorklet.addModule(workletURL)

  const source = audioContext.createMediaStreamSource(stream)
  const pcmProcessor = new AudioWorkletNode(audioContext, 'pcm-processor')

  const ws = await getWS()

  pcmProcessor.port.onmessage = (event) => {
    const int16Buffer = event.data

    const audio = Buffer.from(int16Buffer).toString('base64')

    ws.send(JSON.stringify({ type: 'input_audio_buffer.append', audio }))

    console.log('100ms audio chunk sent')
  }

  source.connect(pcmProcessor)
  pcmProcessor.connect(audioContext.destination)
}

innocent.akhidenor · March 29, 2025, 5:06am

@dkundel Any timeline on when WebRTC will be supported?

dkundel · March 29, 2025, 5:17am

It should work now. Please let me know if you have any issues.

snbalay · March 29, 2025, 8:41am

@dkundel are there any docs or examples on how to set up a transcription session via webrtc?

or could you share which connection details should be used? base url, query params and headers

Thanks!

innocent.akhidenor · March 30, 2025, 4:33am

@dkundel Still doesn’t work for me.
Here is my code:

    const baseUrl= "https://api.openai.com/v1/realtime/transcription_sessions";
    const model = "gpt-4o-transcribe";

    sdpResponse = await fetch(`${baseUrl}?model=${model}`, {
      method: "POST",
      body: offer.sdp,
      headers: {
        Authorization: `Bearer ${ephemeralKey}`,
        "Content-Type": "application/sdp",
        "OpenAI-Beta": "realtime=v1",
      },
    });

I get this response:

Failed to start RTC connection: [Error: OpenAI API error: {
  "error": {
    "message": "Unsupported content type: 'application/sdp'. This API method only accepts 'application/json' requests, but you specified the header 'Content-Type: application/sdp'. Please try again with a supported content type.",
    "type": "invalid_request_error",
    "param": null,
    "code": "unsupported_content_type"
  }
}]

innocent.akhidenor · April 1, 2025, 5:12am

@dkundel any update on this? I am blocked on a project

juberti · April 1, 2025, 5:24am

WebRTC support has been deployed. See Realtime Transcription for an example.

innocent.akhidenor · April 1, 2025, 5:52am

@juberti Thanks so much for sharing the demo! It looks great. I was hoping to understand more about how it’s set up to work with WebRTC, though — is there any documentation or setup guide I could follow?

innocent.akhidenor · April 1, 2025, 6:29am

Figured it out! Had to first create a session before establishing the WebRTC connection.

juberti · April 1, 2025, 8:01pm

You can take a look at the code at https://github.com/juberti/demos/tree/main/realtime/transcribe

Topic		Replies	Views
Realtime WebRTC doesn't work with ephemeral token API	1	96	March 23, 2025
Realtime/transcription_sessions API returns 401 even when adding ephemeral key API transcribe , realtime	8	269	April 7, 2025
Transcription config for `gpt-4o-mini-transcribe` doesn't work? Bugs	4	345	March 21, 2025
New audio models in the API + tools for voice agents Announcements	26	3426	March 30, 2025
Real Time Speech To Text API Disconnects Immediately API realtime , api-realtime	1	153	March 25, 2025

WebRTC transcription guide seems to be broken

Related topics