How to get text only output from the Realtime API?

Hi!

I’d like to build some UI controls using the Realtime API and I’m not interested in the audio output (just in the function calls really). Is there a way to call the API without getting billed for the audio output?

Thanks :pray:

1 Like

The RealTime API uses a voice-to-voice model. The only reason you’d want to use it is for the extremely low latency in voice communication.

If you want to transcribe the text you can use any typical model like Whisper.

1 Like

@RonaldGRuckus thanks for your answer! I was attracted by the realtime api because of the low latency in general to get a function call. Going the whisper + gpt route is ok but the latency for whisper is not amazing from my tests.

Gotta try running locally the new whisper-large-v3-turbo and see if things are better.

Thank you!

1 Like

The new turbo model is quite fast, especially for it’s size :heart_eyes:.

With this new model you would have to pay for the output audio tokens first to get the transcript.

1 Like

To answer your question yes you can send text messages to the realtime API but as @RonaldGRuckus suggests it is geared more for audio input. It’s pretty pricey even for text-to-function calling.

You’re supposed to be able to both send and receive text from the model but I haven’t worked out how to do the receive part yet…

https://platform.openai.com/docs/guides/realtime/examples

Ah… you can set the modalities parameter for the session to just ["text"]. It defaults to ["text", "audio"].

@stevenic I wanted audio input, text/function_call output, which i don’t think is what the modalities allow me to do :confused: I think i’ll try to go the whisper route

1 Like

Ah… I see… Even the Audio input is pretty pricey at $100/million tokens.