Will audio output streaming be available with GPT-4o?

jdshutt · June 11, 2024, 3:03pm

I know that audio with GPT-4o is not supported yet in the API, but I was wondering if the streaming SDK for audio will be similar to text? And, related: Will audio streaming even be a thing, or do you have to wait for the full audio response before you start playing?

My current framework makes a round trip—transcribe audio input, generate a response, turn that into audio output. I don’t use streaming, since everything is built around showing the text and audio once it’s ready, as well as saving them to a local database. I’m not inclined to rewrite the code to support text streaming unless audio streaming will be available in the future and look pretty similar. I’m using the Assistants API, by the way.

Any hints about what the future SDK will look like would be helpful!

turbolucius · June 20, 2024, 1:18pm

As of right now OpenAI hasn’t disclosed anything about the GPT-4o audio API, so all we can do is guess based on what we’ve got today and the demos they’ve shown.

Considering the fact that the current text-to-speech API already supports streaming, and that one of the main selling points of the upcoming voice mode is the low latency response, it’s pretty safe to assume that the new one will also support that. Still no communication on when exactly it’s going to come out or how to implement it, though.

Topic		Replies	Views
What will the GPT-4o audio API look like? API audio , gpt-4o	9	3730	October 2, 2024
Enabling Audio Access for GPT-4o via API API gpt-4	0	326	September 5, 2024
GPT-4o text to speech and speech to text API	19	18585	September 30, 2024
Waiting for gpt-4o-audio-preview API audio	11	3141	November 4, 2024
Speech to Speech via API vs. waiting for GPT-4o voice API gpt-4 , assistants-api , speech	2	322	August 9, 2024

Will audio output streaming be available with GPT-4o?

Related topics