I am using the Chat Completion API with gpt-4o-audio-preview model to sent voice input and get text output. Everything is working fine except one thing. I need to get the text transcript of my voice input so that I can have a multi-turn conversation with the model. I have been pondering on ways to do it. The options that I had thought about is the following
1. Call the Whisper API seperately to get the voice input transcripted. (will incur more cost and latecy)
2. Use response_format to get the response as well as the transcript in the same response.
Out of the above option the second option is the best one. I have been trying to do it using the below approach but I am not getting the response right. Please help me get the message body right or if there is any other alternative, please suggest it.
I just need just to properties “outputResult” and “inputTranscript”
MESSAGE BODY
{
"model": "gpt-4o-audio-preview",
"response_format": {
"type": "json_object",
"properties": {
"outputResult": {
"type": "string",
"description": "The result of the current call"
},
"inputTranscript": {
"type": "string",
"description": "The transcript of the text input"
}
}
},
"messages": [
{
"role": "developer",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "//vUxAADwAABpAAAACAAADSAAAAETEFNRQMACQgABAAAAAAAAAAAA...",
"format": "mp3"
}
}
]
}
],
"modalities": [
"text"
]
}
RESPONSE BODY
{
"error": {
"message": "Unknown parameter: 'response_format.properties'.",
"type": "invalid_request_error",
"param": "response_format.properties",
"code": "unknown_parameter"
}
}