Audio-preview || how to get both audio and text output

anteatereater · November 5, 2024, 2:16am

Is it possible to get both audio and text output from gpt-4o-audio-preview?

For example, I want it to generate question as audio output and its type as text output. How can this be achieved?

_j · November 5, 2024, 2:52am

The model will respond how it chooses, either as typical text, or voice with a transcript available.

The “choosing” is based on if you continue a voice only conversation, or if you revert to text inputs in a chat for assistant and user replay.

If, internally, audio modality is one type of “language” the AI can write as tokens, and text-encoded tokens are another type of production you can receive, it is possible that the AI model could generate in “mixed media” (like it would for generating images while talking about them, which has not been released).

However, you cannot train the voice AI in-context with assistant messages, and you cannot even instruct what is going to be produced as a result of your voice or text input, so voice and a different text seems an impossibility.

You’ll likely need to send the user input processed by Whisper and/or AI transcript to another AI classification if you wish to receive a “type” based on that.

sps · November 5, 2024, 3:05am

Welcome to the dev forum @anteatereater

AFAIK, it’s not possible to get both audio and text at the same time currently. You will get a transcript of the audio response though.

Here’s an example:

import base64
from openai import OpenAI

client = OpenAI()

completion = client.chat.completions.create(
    model="gpt-4o-audio-preview",
    modalities=["text", "audio"],
    audio={"voice": "alloy", "format": "wav"},
    messages=[
        {
            "role": "user",
            "content": "Is a golden retriever a good family dog?"
        }
    ]
)

print(completion.choices[0]. message.audio.transcript)

wav_bytes = base64.b64decode(completion.choices[0].message.audio.data)
with open("dog.wav", "wb") as f:
    f.write(wav_bytes)

Topic		Replies	Views
Waiting for gpt-4o-audio-preview API audio	11	2965	November 4, 2024
Audio support in the Chat Completions API Announcements	13	3973	December 12, 2024
Realtime API message response - Audio + Text API realtime	2	644	October 17, 2024
How to download audio from gpt-4o-audio-preview API gpt-4o-audio-preview	2	271	December 6, 2024
How use response_format to get transcript for voice input along with the text output API chatgpt , api , response_format , gpt-4o-audio-preview	0	178	January 10, 2025

Audio-preview || how to get both audio and text output

Related topics