Running Whisper API as action on GPTs

I’ve made the following schema to use whisper and do STT:

{
  "openapi": "3.1.0",
  "info": {
    "title": "OpenAI Whisper API",
    "description": "Transcribes audio files using OpenAI's Whisper model",
    "version": "v1.0.0"
  },
  "servers": [
    {
      "url": "https://api.openai.com"
    }
  ],
  "paths": {
    "/v1/audio/transcriptions": {
      "post": {
        "description": "Transcribes an audio file using the Whisper model",
        "operationId": "TranscribeAudio",
        "parameters": [],
        "requestBody": {
          "content": {
            "multipart/form-data": {
              "schema": {
                "$ref": "#/components/schemas/TranscribeAudioRequestSchema"
              }
            }
          },
          "required": true
        },
        "security": [
          {
            "apiKey": []
          }
        ]
      }
    }
  },
  "components": {
    "schemas": {
      "TranscribeAudioRequestSchema": {
        "properties": {
          "file": {
            "type": "string",
            "format": "binary",
            "description": "Audio file to be transcribed"
          },
          "model": {
            "type": "string",
            "title": "model",
            "description": "ID of the Whisper model to use",
            "default": "whisper-1"
          },
          "response_format": {
            "type": "string",
            "description": "Format of the transcription response",
            "default": "text"
          }
        },
        "type": "object",
        "required": [
          "file",
          "model",
          "response_format"
        ],
        "title": "TranscribeAudioRequestSchema"
      }
    },
    "securitySchemes": {
      "apiKey": {
        "type": "apiKey",
        "in": "header",
        "name": "Authorization",
        "description": "Bearer TOKEN"
      }
    }
  }
}

The user should upload the file (e.g. .mp3) and then the API will be triggered to do the STT. The agent seems to be talking to API but comes back with the following output:

Talked to api.openai.com

It seems there was an issue with transcribing the audio file due to a missing model parameter in the request. Could you please try uploading the file again, or let me know how else I can assist you?

Which I don’t know the params hence the empty list. The curl version of this is quite straightforward:

curl --request POST \
  --url https://api.openai.com/v1/audio/transcriptions \
  --header 'Authorization: Bearer TOKEN' \
  --header 'Content-Type: multipart/form-data' \
  --form file=@openai.mp3 \
  --form model=whisper-1 \
  --form response_format=text

I wonder if anyone could crack this yet in the action configuration of GPT agents?

1 Like

Not sure but perhaps this will help:

" Text-to-speech (TTS)

Developers can now generate human-quality speech from text via the text-to-speech API. Our new TTS model offers six preset voices to choose from and two model variants, tts-1 and tts-1-hd. tts is optimized for real-time use cases and tts-1-hd is optimized for quality. Pricing starts at $0.015 per input 1,000 characters. Check out our TTS guide to get started."

I have managed to force it to pass the tiny.en parameter BUT still does not transcribe. Here is the conversation. (attached)

I see, it is hard to get it to work since there is no documentation on schemas specially for Whisper

check:

and

Thanks I followed the second link to generate the following schema:

openapi: 3.1.0
info:
  title: OpenAI Audio Transcription API
  version: v1.0.0
servers:
  - url: https://api.openai.com
    description: OpenAI API server
paths:
  /v1/audio/transcriptions:
    post:
      summary: Transcribe audio file
      description: This endpoint transcribes the provided audio file using the model "whisper-1" and returns the transcription in text format.
      operationId: TranscribeAudio
      requestBody:
        content:
          multipart/form-data:
            schema:
              $ref: '#/components/schemas/TranscriptionRequestSchema'
      responses:
        '200':
          description: Successful transcription of audio.
          content:
            text/plain:
              schema:
                type: string
      security:
        - apiKey: []
components:
  schemas:
    TranscriptionRequestSchema:
      type: object
      properties:
        file:
          type: string
          format: binary
          description: The audio file to be transcribed.
        model:
          type: string
          description: The model to be used for transcription.
          default: "whisper-1"
          readOnly: true
        response_format:
          type: string
          description: The format of the response.
          default: "text"
          readOnly: true
      required:
        - file
  securitySchemes:
    apiKey:
      type: apiKey
      in: header
      name: Authorization

But when I upload the audio, it raises error on model params. But the required ones are file (which I upload) and the model (which is Whisper-1) and I’ve fixed the latter on schema. Have you managed to make this work on your side?

@ilia.teimouri were you ever able to get this action to work and transcribe audio files? i’m having the same issue and looking for some help. thx

What about you? Any luck? I’d love to get a Whisper transcription inside a GPT.