I’ve made the following schema to use whisper and do STT:
{
"openapi": "3.1.0",
"info": {
"title": "OpenAI Whisper API",
"description": "Transcribes audio files using OpenAI's Whisper model",
"version": "v1.0.0"
},
"servers": [
{
"url": "https://api.openai.com"
}
],
"paths": {
"/v1/audio/transcriptions": {
"post": {
"description": "Transcribes an audio file using the Whisper model",
"operationId": "TranscribeAudio",
"parameters": [],
"requestBody": {
"content": {
"multipart/form-data": {
"schema": {
"$ref": "#/components/schemas/TranscribeAudioRequestSchema"
}
}
},
"required": true
},
"security": [
{
"apiKey": []
}
]
}
}
},
"components": {
"schemas": {
"TranscribeAudioRequestSchema": {
"properties": {
"file": {
"type": "string",
"format": "binary",
"description": "Audio file to be transcribed"
},
"model": {
"type": "string",
"title": "model",
"description": "ID of the Whisper model to use",
"default": "whisper-1"
},
"response_format": {
"type": "string",
"description": "Format of the transcription response",
"default": "text"
}
},
"type": "object",
"required": [
"file",
"model",
"response_format"
],
"title": "TranscribeAudioRequestSchema"
}
},
"securitySchemes": {
"apiKey": {
"type": "apiKey",
"in": "header",
"name": "Authorization",
"description": "Bearer TOKEN"
}
}
}
}
The user should upload the file (e.g. .mp3) and then the API will be triggered to do the STT. The agent seems to be talking to API but comes back with the following output:
Talked to api.openai.com
It seems there was an issue with transcribing the audio file due to a missing model parameter in the request. Could you please try uploading the file again, or let me know how else I can assist you?
Which I don’t know the params hence the empty list. The curl version of this is quite straightforward:
curl --request POST \
--url https://api.openai.com/v1/audio/transcriptions \
--header 'Authorization: Bearer TOKEN' \
--header 'Content-Type: multipart/form-data' \
--form file=@openai.mp3 \
--form model=whisper-1 \
--form response_format=text
I wonder if anyone could crack this yet in the action configuration of GPT agents?