Will audio output streaming be available with GPT-4o?

I know that audio with GPT-4o is not supported yet in the API, but I was wondering if the streaming SDK for audio will be similar to text? And, related: Will audio streaming even be a thing, or do you have to wait for the full audio response before you start playing?

My current framework makes a round trip—transcribe audio input, generate a response, turn that into audio output. I don’t use streaming, since everything is built around showing the text and audio once it’s ready, as well as saving them to a local database. I’m not inclined to rewrite the code to support text streaming unless audio streaming will be available in the future and look pretty similar. I’m using the Assistants API, by the way.

Any hints about what the future SDK will look like would be helpful!

1 Like

As of right now OpenAI hasn’t disclosed anything about the GPT-4o audio API, so all we can do is guess based on what we’ve got today and the demos they’ve shown.

Considering the fact that the current text-to-speech API already supports streaming, and that one of the main selling points of the upcoming voice mode is the low latency response, it’s pretty safe to assume that the new one will also support that. Still no communication on when exactly it’s going to come out or how to implement it, though.

1 Like