The Chat Completions API now supports audio inputs and outputs using a new model snapshot: gpt-4o-audio-preview. Based on the same advanced voice model powering the Realtime API, audio support in the Chat Completions API lets you:
.
Handle any combination of text and audio: Pass in text, audio, or text and audio and receive responses in both audio and text.
Use natural, steerable voices: Similar to the Realtime API, you can use prompting to shape the language, pronunciation, emotional range, and other aspects of the generated audio.
Use tool calling: Pass tool definitions and include instructions on tool use in the system prompt, similar to how you would with text in Chat Completions. The output of the tool call will be delivered via text + audio.
This feature is well-suited for asynchronous use cases that don’t require extreme low latencies. For more dynamic and real-time interactions, you should use the Realtime API. To get started, see the guide on audio support in our docs.
completion = client.chat.completions.create(
model='gpt-4o-audio-preview',
modalities=["text","audio"],
audio={"voice": "onyx", "format": "wav"},
messages=[
{
"role": "user",
"content": "In a jaunty American Colorado Mountain Region accent, I'd like you to please introduce yourself. Then, in a slow, brittish drawl, please go full-on thesbian and declaim a paragraph from a famous Shakespeare play."
}
]
)
I made a youtube short from the response, which was about 30 seconds long. (How can we share audio here?) Blew my mind, check it out.
Chat.completions support alloy , echo , fable , onyx , nova , and shimmer. But RealtimeClient only supports alloy, echo and shimmer. Are you planning to support the additional voices in RealtimeClient?
Is it possible to do audio-in and audio-out, docs don’t suggest so. But docs also say that only difference between realtime api is that its lower latency.
I keep getting this error when using the new model:gpt-40-audio-priview:
TypeError: Completions.create() got an unexpected keyword argument ‘modalities’