I’m trying to use the new GPT-Audio model in the chat completions API using the Python client simply using a modified version of the example code snippet
import os
import base64
import requests
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
client = OpenAI(api_key=OPENAI_API_KEY)
# Fetch the audio file and convert it to a base64 encoded string
url = "https://cdn.openai.com/API/docs/audio/alloy.wav"
response = requests.get(url)
response.raise_for_status()
wav_data = response.content
encoded_string = base64.b64encode(wav_data).decode('utf-8')
completion = client.chat.completions.create(
model="gpt-audio",
modalities=["text"],
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's the sentiment of this recording?"
},
{
"type": "input_audio",
"input_audio": {
"data": encoded_string,
"format": "wav"
}
}
]
},
]
)
print(completion.choices[0].message)
This results in
openai.InternalServerError: Error code: 500 - {‘error’: {‘message’: ‘The server had an error while processing your request. Sorry about that!’, ‘type’: ‘server_error’, ‘param’: None, ‘code’: None}}
Looks like an OpenAI side issue but do let me know if otherwise!
You can’t just randomly make up AI model names like “gpt-audio” and expect things to work.
Also it is silly for OpenAI’s example to expect the third-party requests library when their own SDK has httpx as a requirement - and silly to use OpenAI’s library if you have httpx…
import os, base64, httpx
from dotenv import load_dotenv; load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
audio_b64 = base64.b64encode(
httpx.get("https://cdn.openai.com/API/docs/audio/alloy.wav").content
).decode("utf-8")
messages = [{
"role": "user",
"content": [
{"type": "text", "text": "What's the sentiment of this recording?"},
{"type": "input_audio", "input_audio": {"data": audio_b64, "format": "wav"}},
],
}]
params = {
"model": "gpt-4o-audio-preview-2025-06-03", # real model
"modalities": ["text"], # or ["text", "audio"] for voice output
"audio": {"voice": "marin", "format": "mp3"},
"max_completion_tokens": 2000,
"temperature": 0.7,
"top_p": 0.7,
}
with httpx.Client(timeout=180) as client:
r = client.post(
"https://api.openai.com/v1/chat/completions",
headers={"Authorization": f"Bearer {OPENAI_API_KEY}"},
json={**params, "messages": messages},
)
r.raise_for_status()
print(r.json()["choices"][0]["message"]["content"])
Edit: it took a month, but the model is finally working.
I’ve tried about every iteration of parameter, such as “developer”, imagining it needs a non-default temperature range, audio out vs audio in, formats up to pcm16, reverting to max_tokens, etc. gpt-audio as a model is a no-go, just 500 server error.
Same issue is happenning with me. I am trying to use gpt audio and even after package upgrade, it says,
openai.InternalServerError: Error code: 500 - {‘error’: {‘message’: ‘The server had an error while processing your request. Sorry about that!’, ‘type’: ‘server_error’, ‘param’: None, ‘code’: None}}
However, if in the exact same code above, I switch to model model=“gpt-4o-audio-preview”
it magically works and provide me response. However on this models page https://platform.openai.com/docs/models I could not find any hint to the old gpt-4o-audio-previewanymore.
We’ve been working on a fix on our end and things should be more stable now, but there may still be occasional hiccups. The safest setup at the moment is to use the snapshot gpt-4o-audio-preview-2025-06-03 and keep n=1 in your requests. That combination has been reliable, while higher concurrency is what tends to trigger the errors.
We know this isn’t ideal, and our team is continuing to work on a full fix! Thank you for all your patience here.
Hi @vc-openai - is there any difference between gpt-4o-audio-preview and gpt-audio? I am looking to upgrade to the new model assuming it brings higher quality audio and chat responses. Can you confirm if this is the case?
gpt-audio is the GA release of gpt-4o-audio-preview, so it provides the same high-quality, steerable text+audio in/out as the preview, but as a production Chat Completions model. If you need low-latency streaming or realtime voice-to-voice behavior, use the Realtime API snapshots, and double-check the API surface you plan to use since streaming and base64-audio support can differ across Chat Completions, Realtime, and Responses.