Can I use Openai Realtime API for Speech-to-Text?

frommars · January 25, 2025, 3:05pm

I would like to create an app that does realtime (or near realtime) Speech-to-Text.

I tested with Whisper but the delay to return the response was quite large, also I had to keep calling the API each few seconds.

So I found Openai Realtime API which might be a good option, I just don’t know if allows Speech-to-Text functionality, does anyone know?

platypus · January 25, 2025, 5:03pm

Hi @frommars ! I’ve seen people apply Whisper in realtime but it usually involves running Whisper locally (it’s an open weight model!), and for best results it typically involves invoking one of the “fast” Whisper variants.

aza · January 25, 2025, 7:52pm

I’m pretty sure the answer is no.

The transcription done data channel messages are only for the model generated responses. There is no message returned with a transacription of the audio input.

TW-DucNguyenMata · January 27, 2025, 7:08am

response.audio_transcript.done: Fires after receiving the assistant’s response.
conversation.item.input_audio_transcription.completed: Fires after transcribing user input.
output_audio_buffer.audio_stopped: Fires when the assistant stops speaking.

Prerequisite:
MUST have this update into the session:

const updateSession = {
  type: "session.update",
  event_id: "message_004",
  session: {
    input_audio_transcription: {
      model: "whisper-1"
    }
  },
};

dataChannel.addEventListener("open", () => {
        sendClientEvent(updateSession)
      });

vamsireddy.16 · January 30, 2025, 5:48pm

I am using Python SDK, even though you set the input_audio_transcription parameter, not getting the event from server. The Model is saying it can’t provide the transcription.

sashirestela · January 30, 2025, 6:09pm

Yeah, the OpenAI Realtime API allows you to translate your voice to text. Take in account that it uses the whisper-1 model behind the scenes for this.

Here you can see an example in Java, but you can extrapolate to another context:

Topic		Replies	Views
Sharing experiences about Realtime in the backend ☕ API java , realtime	0	985	January 29, 2025
How do you handle user transcripts in real-time GPT-4o chats? API gpt-4	2	191	June 3, 2025
Extracting Transcription Without Using input_audio.input_transcription in OpenAI API API realtime , api-realtime	10	329	March 11, 2025
Can I replace OpenAI's Whisper transcription in real-time WebRTC chat with a custom transcription function? API realtime	0	105	April 17, 2025
How to create a (near) realtime Speech-to-Text using Whisper? API	0	546	January 25, 2025

Can I use Openai Realtime API for Speech-to-Text?

Related topics