Pricing When Using the Whisper Model with RealtimeAPI

(Please note: This text is a translation from Japanese.)

I have a question regarding pricing when using RealtimeAPI for real-time transcription.

If I specify the Whisper model when making a request, will it be billed at the Whisper model rate ($0.006 per minute)?

The RealtimeAPI documentation mentions that Whisper can be used, but the RealtimeAPI pricing table does not list Whisper.
https://openai.com/api/pricing/

Is my understanding correct that Whisper can be used with RealtimeAPI?

1 Like

Hi and welcome to the community!

You’ll be using two models with separate costs for different tasks — one for generating responses and one for transcribing audio.

The realtime model handles audio input and provides both a spoken reply and a transcript.
The Whisper model is used to transcribe the user’s audio input.

A few notes:

  • realtime models can’t currently reply with audio only but you can select text only as output modality
  • if the goal is to create a transcription in real time without a separate reply to the user from a realtime model you can look at the documentation for streaming transcriptions using the audio API.
2 Likes