OpenAI Realtime API for Audio Input → Text Output Only

I want to use the OpenAI Realtime API to process audio input and receive only text output in response. I do not need any audio response—just the text output.

Has anyone implemented this? I’m looking for:

  1. How to send live audio input to the API.
  2. How to configure the API to return only text (disable audio response).

Would appreciate any insights or sample code!

Hey!

I think you can just set the modalities to only text instead of both: text, audio.

Do let me know if this helps.

Cheers! :hugs:

1 Like