Can I replace OpenAI's Whisper transcription in real-time WebRTC chat with a custom transcription function?

Cyan_Feng · April 17, 2025, 11:24am

I’m working on a real-time voice chat using realtimeapi WebRTC. Currently, audio is being transcribed using whisper model. But the transcription quality isn’t ideal for my use case.

Is it possible to replace or bypass the Whisper transcription and instead provide my own speech-to-text function (e.g., from a Lambda endpoint or custom backend)? I noticed the transcript seems to happen at OpenAI’s end and that WebRTC streams audio directly to OpenAI (correct me if im wrong), so I’m wondering if there’s a way to intercept the audio and insert my own transcription instead.

Here’s a snippet I’m working with:
const event = {
type: “session.update”,
session: {
instructions: instructions,
input_audio_transcription: { model: “whisper-1” },
},
};

Topic		Replies	Views
Can I use Openai Realtime API for Speech-to-Text? API realtime	5	2776	January 30, 2025
Audio input transcription issue API api-realtime	0	74	March 9, 2025
Web Speech API with whisper API whisper	1	63	July 24, 2025
Extracting Transcription Without Using input_audio.input_transcription in OpenAI API API realtime , api-realtime	10	418	March 11, 2025
How do you handle user transcripts in real-time GPT-4o chats? API gpt-4	2	325	June 3, 2025

Can I replace OpenAI's Whisper transcription in real-time WebRTC chat with a custom transcription function?

Related topics