How to download audio from gpt-4o-audio-preview

the.brainiac · December 6, 2024, 4:00am

Here is a sample conversation with audio-preview:

curl "https://api.openai.com/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -d '{
        "model": "gpt-4o-audio-preview",
        "modalities": ["text", "audio"],
        "audio": { "voice": "alloy", "format": "wav" },
        "messages": [
            {
                "role": "user",
                "content": "Is a golden retriever a good family dog?"
            },
            {
                "role": "assistant",
                "audio": {
                    "id": "audio_abc123"
                }
            },
            {
                "role": "user",
                "content": "Why do you say they are loyal?"
            }
        ]
    }

How can I download the audio id mentioned above ?

sps · December 6, 2024, 4:15am

You can do that by by writing the bytes of the data attribute from the audio object contained within the received chat completion message, to a file.

import base64
from openai import OpenAI

client = OpenAI()

completion = client.chat.completions.create(
    model=“gpt-4o-audio-preview”,
    modalities=[“text”, “audio”],
    audio={“voice”: “alloy”, “format”: “wav”},
    messages=[
        {
            “role”: “user”,
            “content”: “Is a golden retriever a suitable family dog?”
        }
    ]
)

print(completion.choices[0])

wav_bytes = base64.b64decode(completion.choices[0].message.audio.data)
with open(“dog.wav”, “wb”) as f:
    f.write(wav_bytes)

_j · December 6, 2024, 5:16am

The audio ID itself cannot be replayed or recovered. That assistant output is stored server-side just for continuing a conversation, with expiration.

The obvious reason for this ID system for chat history audio is because OpenAI doesn’t want the ability for developers to place their own audio in API requests for the voice or messages the assistant responds with, to retrain output with in-context learning. It also breaks long-term chat continuations by expiring.

You’ll have to save the original response message and its generated audio part, to allow a chat UI to replay what was previously spoken if that’s the desired application.

For that response data collection as example, I just tacked on an audio extractor to replace streaming tool, function, and other object collection from a Python httpx request (not OpenAI SDK).

    if response.status_code != 200:
        print(f"HTTP error {response.status_code}: {response.text}")
        # retry/reprompt
        continue
    else:
        print("API request: success")
        response_content = b''
        for chunk in response.iter_bytes(chunk_size=8192):
            if chunk:
                response_content += chunk
        response_data = json.loads(response_content.decode('utf-8'))

        if 'choices' in response_data and response_data['choices']:
            print("-- choices list received --")
            choice = response_data['choices'][0]['message']
            reply = choice.get('content', "")
            audio_data = choice.get('audio', {})
            audio_base64 = audio_data.get('data', "")
            transcript = audio_data.get('transcript', "")
            print(reply if reply is not None else '', transcript if transcript is not None else '')

            print("\n", response_data.get('usage', {}))
            
            if audio_base64:
                save_and_play_audio(audio_base64, VOICE)
            # use the ID if you really want, I don't
            chat.append({"role": "assistant", "content": reply or transcript or ""})
            user_input = input("\nPrompt: ")
            user_message = {"role": "user", "content": user_input}
            chat.append(user_message)
        else:
            print("No valid response received.")
            ...

save_and_play_audio() does what it says.

CURL is not the right tool…

Topic		Replies	Views
Audio-preview \|\| how to get both audio and text output API	2	643	November 5, 2024
GPT4 audio preview with streaming of audio output API gpt-4	2	687	January 18, 2025
Waiting for gpt-4o-audio-preview API audio	11	3582	November 4, 2024
How to replace my GPT TTS call for better performance? API tts , audio	1	268	November 5, 2024
Download generated file from response API gpt-4 , api	1	3181	February 15, 2024

How to download audio from gpt-4o-audio-preview

Related topics