Is whisper-1 input transcription required for function calling in Realtime api speech-to-speech?

aminemouslime98 · January 31, 2025, 4:32pm

I’m experimenting with the beta Realtime API in a purely speech-to-speech scenario. According to this API reference, transcription via Whisper is not native to the main speech-to-speech model; it’s an optional, asynchronous feature.

My goal is to use function calling to produce structured json outputs based on spoken user input. The model itself seems to handle the audio directly so i’m not sure if enabling Whisper transcription is necessary for my function-calling flow or if it’s purely optional.

Specifically, at the end of the conversation, I need to produce a structured json response containing key information the user provided (through speech interactions). Does that require me to enable ‘input_audio_transcription’? Or can the model handle speech-to-speech natively and still trigger function calls that produce the json data?

Thanks in advance !

Topic		Replies	Views
Can I use Openai Realtime API for Speech-to-Text? API realtime	5	2661	January 30, 2025
Can I replace OpenAI's Whisper transcription in real-time WebRTC chat with a custom transcription function? API realtime	0	149	April 17, 2025
How to get text only output from the Realtime API? API api , realtime	14	3696	June 20, 2025
Whisper-1 joint translation and transcription API	6	3389	October 21, 2024
Extracting Transcription Without Using input_audio.input_transcription in OpenAI API API realtime , api-realtime	10	389	March 11, 2025

Is whisper-1 input transcription required for function calling in Realtime api speech-to-speech?

Related topics