How use response_format to get transcript for voice input along with the text output

harish.mohanan90 · January 10, 2025, 9:55am

I am using the Chat Completion API with gpt-4o-audio-preview model to sent voice input and get text output. Everything is working fine except one thing. I need to get the text transcript of my voice input so that I can have a multi-turn conversation with the model. I have been pondering on ways to do it. The options that I had thought about is the following
1. Call the Whisper API seperately to get the voice input transcripted. (will incur more cost and latecy)
2. Use response_format to get the response as well as the transcript in the same response.
Out of the above option the second option is the best one. I have been trying to do it using the below approach but I am not getting the response right. Please help me get the message body right or if there is any other alternative, please suggest it.

I just need just to properties “outputResult” and “inputTranscript”
MESSAGE BODY

{
    "model": "gpt-4o-audio-preview",
    "response_format": {
        "type": "json_object",
            "properties": {
                "outputResult": {
                    "type": "string",
                    "description": "The result of the current call"
                },
                "inputTranscript": {
                    "type": "string",
                    "description": "The transcript of the text input"
                }
        }
    },
    "messages": [
        {
            "role": "developer",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "//vUxAADwAABpAAAACAAADSAAAAETEFNRQMACQgABAAAAAAAAAAAA...",
                        "format": "mp3"
                    }
                }
            ]
        }
    ],
    "modalities": [
        "text"
    ]
}

RESPONSE BODY

{
    "error": {
        "message": "Unknown parameter: 'response_format.properties'.",
        "type": "invalid_request_error",
        "param": "response_format.properties",
        "code": "unknown_parameter"
    }
}

Topic		Replies	Views
Multiturn conversation format using gpt-4o-audio-preview with audio input API audio	1	469	November 12, 2024
Audio-preview \|\| how to get both audio and text output API	2	695	November 5, 2024
Request to gpt-4o-mini-transcribe model API	4	705	March 30, 2025
How to get input_audio_transcription when i use openai realtime api API realtime , api-realtime , api-realtime-speech	1	126	June 24, 2025
Realtime API Audio Modality output API realtime , api-realtime , api-realtime-speech	7	883	December 13, 2024

How use response_format to get transcript for voice input along with the text output

Related topics