Running Whisper API as action on GPTs

ilia.teimouri · November 12, 2023, 4:20pm

I’ve made the following schema to use whisper and do STT:

{
  "openapi": "3.1.0",
  "info": {
    "title": "OpenAI Whisper API",
    "description": "Transcribes audio files using OpenAI's Whisper model",
    "version": "v1.0.0"
  },
  "servers": [
    {
      "url": "https://api.openai.com"
    }
  ],
  "paths": {
    "/v1/audio/transcriptions": {
      "post": {
        "description": "Transcribes an audio file using the Whisper model",
        "operationId": "TranscribeAudio",
        "parameters": [],
        "requestBody": {
          "content": {
            "multipart/form-data": {
              "schema": {
                "$ref": "#/components/schemas/TranscribeAudioRequestSchema"
              }
            }
          },
          "required": true
        },
        "security": [
          {
            "apiKey": []
          }
        ]
      }
    }
  },
  "components": {
    "schemas": {
      "TranscribeAudioRequestSchema": {
        "properties": {
          "file": {
            "type": "string",
            "format": "binary",
            "description": "Audio file to be transcribed"
          },
          "model": {
            "type": "string",
            "title": "model",
            "description": "ID of the Whisper model to use",
            "default": "whisper-1"
          },
          "response_format": {
            "type": "string",
            "description": "Format of the transcription response",
            "default": "text"
          }
        },
        "type": "object",
        "required": [
          "file",
          "model",
          "response_format"
        ],
        "title": "TranscribeAudioRequestSchema"
      }
    },
    "securitySchemes": {
      "apiKey": {
        "type": "apiKey",
        "in": "header",
        "name": "Authorization",
        "description": "Bearer TOKEN"
      }
    }
  }
}

The user should upload the file (e.g. .mp3) and then the API will be triggered to do the STT. The agent seems to be talking to API but comes back with the following output:

Talked to api.openai.com

It seems there was an issue with transcribing the audio file due to a missing model parameter in the request. Could you please try uploading the file again, or let me know how else I can assist you?

Which I don’t know the params hence the empty list. The curl version of this is quite straightforward:

curl --request POST \
  --url https://api.openai.com/v1/audio/transcriptions \
  --header 'Authorization: Bearer TOKEN' \
  --header 'Content-Type: multipart/form-data' \
  --form file=@openai.mp3 \
  --form model=whisper-1 \
  --form response_format=text

I wonder if anyone could crack this yet in the action configuration of GPT agents?

matt0sai · November 12, 2023, 7:36pm

Not sure but perhaps this will help:

" Text-to-speech (TTS)

Developers can now generate human-quality speech from text via the text-to-speech API. Our new TTS model offers six preset voices to choose from and two model variants, tts-1 and tts-1-hd. tts is optimized for real-time use cases and tts-1-hd is optimized for quality. Pricing starts at $0.015 per input 1,000 characters. Check out our TTS guide to get started."

getinference · November 12, 2023, 9:41pm

I have managed to force it to pass the tiny.en parameter BUT still does not transcribe. Here is the conversation. (attached)

ilia.teimouri · November 13, 2023, 9:01am

I see, it is hard to get it to work since there is no documentation on schemas specially for Whisper

getinference · November 13, 2023, 11:43am

check:

and

ilia.teimouri · November 13, 2023, 5:33pm

Thanks I followed the second link to generate the following schema:

openapi: 3.1.0
info:
  title: OpenAI Audio Transcription API
  version: v1.0.0
servers:
  - url: https://api.openai.com
    description: OpenAI API server
paths:
  /v1/audio/transcriptions:
    post:
      summary: Transcribe audio file
      description: This endpoint transcribes the provided audio file using the model "whisper-1" and returns the transcription in text format.
      operationId: TranscribeAudio
      requestBody:
        content:
          multipart/form-data:
            schema:
              $ref: '#/components/schemas/TranscriptionRequestSchema'
      responses:
        '200':
          description: Successful transcription of audio.
          content:
            text/plain:
              schema:
                type: string
      security:
        - apiKey: []
components:
  schemas:
    TranscriptionRequestSchema:
      type: object
      properties:
        file:
          type: string
          format: binary
          description: The audio file to be transcribed.
        model:
          type: string
          description: The model to be used for transcription.
          default: "whisper-1"
          readOnly: true
        response_format:
          type: string
          description: The format of the response.
          default: "text"
          readOnly: true
      required:
        - file
  securitySchemes:
    apiKey:
      type: apiKey
      in: header
      name: Authorization

But when I upload the audio, it raises error on model params. But the required ones are file (which I upload) and the model (which is Whisper-1) and I’ve fixed the latter on schema. Have you managed to make this work on your side?

rkfalcon · December 22, 2023, 10:44pm

@ilia.teimouri were you ever able to get this action to work and transcribe audio files? i’m having the same issue and looking for some help. thx

david13 · January 25, 2024, 3:51pm

What about you? Any luck? I’d love to get a Whisper transcription inside a GPT.

Topic		Replies	Views
GPTs with Custom Actions by Whisper API and TTS Feedback gpts	18	6226	December 4, 2023
How to send audio file to Whisper API API	4	5412	September 8, 2023
OpenAI Node lib error on Audio Transcription API	5	2645	December 20, 2023
Whisper API - Retry errors, requests show up in the dashboard API whisper	2	1238	August 25, 2023
Whisper api, not transcrip all audio API whisper	3	1949	October 28, 2023

Running Whisper API as action on GPTs

Related topics