(Please note: This text is a translation from Japanese.)
I have a question regarding pricing when using RealtimeAPI for real-time transcription.
If I specify the Whisper model when making a request, will it be billed at the Whisper model rate ($0.006 per minute)?
The RealtimeAPI documentation mentions that Whisper can be used, but the RealtimeAPI pricing table does not list Whisper. https://openai.com/api/pricing/
Is my understanding correct that Whisper can be used with RealtimeAPI?
You’ll be using two models with separate costs for different tasks — one for generating responses and one for transcribing audio.
The realtime model handles audio input and provides both a spoken reply and a transcript.
The Whisper model is used to transcribe the user’s audio input.
A few notes:
realtime models can’t currently reply with audio only but you can select text only as output modality
if the goal is to create a transcription in real time without a separate reply to the user from a realtime model you can look at the documentation for streaming transcriptions using the audio API.