GPT-Audio Not working - Error 500

I’m trying to use the new GPT-Audio model in the chat completions API using the Python client simply using a modified version of the example code snippet

import os
import base64
import requests
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

client = OpenAI(api_key=OPENAI_API_KEY)

# Fetch the audio file and convert it to a base64 encoded string
url = "https://cdn.openai.com/API/docs/audio/alloy.wav"
response = requests.get(url)
response.raise_for_status()
wav_data = response.content
encoded_string = base64.b64encode(wav_data).decode('utf-8')

completion = client.chat.completions.create(
    model="gpt-audio",
    modalities=["text"],
    messages=[
        {
            "role": "user",
            "content": [
                { 
                    "type": "text",
                    "text": "What's the sentiment of this recording?"
                },
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": encoded_string,
                        "format": "wav"
                    }
                }
            ]
        },
    ]
)

print(completion.choices[0].message)

This results in

openai.InternalServerError: Error code: 500 - {‘error’: {‘message’: ‘The server had an error while processing your request. Sorry about that!’, ‘type’: ‘server_error’, ‘param’: None, ‘code’: None}}

Looks like an OpenAI side issue but do let me know if otherwise!

6 Likes

Getting the same error. I’m using the JS sdk.

The solution to the original code:

You can’t just randomly make up AI model names like “gpt-audio” and expect things to work.

Also it is silly for OpenAI’s example to expect the third-party requests library when their own SDK has httpx as a requirement - and silly to use OpenAI’s library if you have httpx…

import os, base64, httpx
from dotenv import load_dotenv; load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

audio_b64 = base64.b64encode(
    httpx.get("https://cdn.openai.com/API/docs/audio/alloy.wav").content
).decode("utf-8")

messages = [{
    "role": "user",
    "content": [
        {"type": "text", "text": "What's the sentiment of this recording?"},
        {"type": "input_audio", "input_audio": {"data": audio_b64, "format": "wav"}},
    ],
}]

params = {
    "model": "gpt-4o-audio-preview-2025-06-03",  # real model
    "modalities": ["text"],  # or  ["text", "audio"] for voice output
    "audio": {"voice": "marin", "format": "mp3"},
    "max_completion_tokens": 2000,
    "temperature": 0.7,
    "top_p": 0.7,
}

with httpx.Client(timeout=180) as client:
    r = client.post(
        "https://api.openai.com/v1/chat/completions",
        headers={"Authorization": f"Bearer {OPENAI_API_KEY}"},
        json={**params, "messages": messages},
    )
    r.raise_for_status()

print(r.json()["choices"][0]["message"]["content"])

Edit: it took a month, but the model is finally working.

1 Like

Actually, it supposedly exists. Just doesn’t work…


3 Likes

Getting the same issue but I’m certain gpt-audio does exist

https://platform.openai.com/docs/models/gpt-audio

Likewise it is selectable in playground on the completions api

2 Likes

Likewise, it is going to barf on you in the playground if attempted:

Nothing here saying “try out this model that stealthed its way in, with no usage examples”.

https://platform.openai.com/docs/guides/audio?lang=python


I’ve tried about every iteration of parameter, such as “developer”, imagining it needs a non-default temperature range, audio out vs audio in, formats up to pcm16, reverting to max_tokens, etc. gpt-audio as a model is a no-go, just 500 server error.

2 Likes

Same issue is happenning with me. I am trying to use gpt audio and even after package upgrade, it says,

openai.InternalServerError: Error code: 500 - {‘error’: {‘message’: ‘The server had an error while processing your request. Sorry about that!’, ‘type’: ‘server_error’, ‘param’: None, ‘code’: None}}

I am using it like this

response = client.chat.completions.create(
                model="gpt-audio-2025-08-28",
                modalities=["text"],
                messages=[
                    {
                        "role": "user",
                        "content": [
                            {"type": "text", "text": prompt},
                            {
                                "type": "input_audio",
                                "input_audio": {"data": audio_base64, "format": "mp3"},
                            },
                        ],
                    }
                ],
            )

However, if in the exact same code above, I switch to model model=“gpt-4o-audio-preview”
it magically works and provide me response. However on this models page https://platform.openai.com/docs/models I could not find any hint to the old gpt-4o-audio-previewanymore.

1 Like

Hi everyone!

We’ve been working on a fix on our end and things should be more stable now, but there may still be occasional hiccups. The safest setup at the moment is to use the snapshot gpt-4o-audio-preview-2025-06-03 and keep n=1 in your requests. That combination has been reliable, while higher concurrency is what tends to trigger the errors.

We know this isn’t ideal, and our team is continuing to work on a full fix! Thank you for all your patience here.

6 Likes

Confirming identical error on my end too.

Hi @vc-openai - is there any difference between gpt-4o-audio-preview and gpt-audio? I am looking to upgrade to the new model assuming it brings higher quality audio and chat responses. Can you confirm if this is the case?

Now for gpt-audio-2025-08-28 I am getting:

Status: 401
message: 'You have insufficient permissions for this operation.'

It was error 500 previously.

Changing the model name back to gpt-4o-audio-preview-2025-06-03, and it works OK.

Are you folks still fixing thing at your end?

is this fixed already? was trying it a couple of days ago and lso were getting 401

@GroeimetAi if you very keen to try it, gpt-audio is available in Azure AI Foundry, I can confirm it works from there via OpenAI-compatible endpoint.

2 Likes

Hey @Dobo, great question!

gpt-audio is the GA release of gpt-4o-audio-preview, so it provides the same high-quality, steerable text+audio in/out as the preview, but as a production Chat Completions model. If you need low-latency streaming or realtime voice-to-voice behavior, use the Realtime API snapshots, and double-check the API surface you plan to use since streaming and base64-audio support can differ across Chat Completions, Realtime, and Responses.

1 Like