Realtime API: Force Audio replies

I found that in a audio/text realtime stream some instructions force the responses to come back as text instead of audio. Is there a way to force responses to audio?

1 Like