Text completion and get voice response

Anyone knows, or maybe the OpenAI team could respond, how we can use the API to send a request for completions endpoint and get an audio stream as response?

Is that possible ou will be soon? (At least the task exists in the backlog)

Current Workaroud (ugly and costly way)

Send a request to the completion API (3 minutes awaiting) :sleeping:
Get the response and send to the audio API (more minutes) :sleeping:

So, the client await long time to be able to listen the response :sleepy:

Better workaround:

  • stream the response, so you are getting tokens as they are generated,
  • start sending response sentences for TTS as soon as they are received,
  • buffer and assemble audio stream, initiating WebRTC playback after buffer underruns are unlikely.