Speech-to-Speech (Audio Input/Output) with 4o

keek · October 2, 2024, 9:02pm

In the Realtime API announcement, it mentioned that

“We’re also introducing audio input and output in the Chat Completions API to support use cases that don’t require the low-latency benefits of the Realtime API. With this update, developers can pass any text or audio inputs into GPT-4o and have the model respond with their choice of text, audio, or both.”

Does this mean I can directly use audio files as my user prompt without having to transcribe it? If so, how can I do this? I was looking at the docs for Chat Completions and it does not seem to be updated on this topic.

supershaneski · October 2, 2024, 11:30pm

looking at the API references page for chat completions, it will likely to be included into the message property:

A list of messages comprising the conversation so far. Depending on the model you use, different message types (modalities) are supported, like text, images, and audio.

the audio part link is not working yet so probably it will be updated soon

cutmasta_kun · October 7, 2024, 8:59pm

I also wondered where this feature is, after they announced it in the Realtime API Announcement. There also don’t seem to be any changes to the sdk libraries, that would indicate a big audio update in any direction, yet.

Medy · October 12, 2024, 11:32pm

any news about this yet ? I read the announcement and was looking to send audio via the completions api

anon25271712 · October 13, 2024, 1:34am

I don’t think so, I’m expecting a “tts-2” model to release eventually, but who knows

Medy · October 13, 2024, 3:16pm

I mean its already mentioned here : https://platform.openai.com/docs/api-reference/chat/create

but i guess they still working on it or something. maybe it will release when the beta is done ?
Anyway excited about this .

Topic		Replies	Views
GPT-4o Chat Completion with audio response API	6	5837	May 24, 2024
Audio support in the Chat Completions API Announcements	13	3840	December 12, 2024
GPT-4o Audio Access for API API gpt-4o	28	32731	December 13, 2024
New model, tts-2, any news on it? (new voice mode) API tts	9	1658	February 21, 2025
GPT-4o text to speech and speech to text API	19	18188	September 30, 2024

Speech-to-Speech (Audio Input/Output) with 4o

Related topics