Request to gpt-4o-mini-transcribe model

I have got a wrapper which would like to call gpt-4o-mini-transcribe but I receive the following response when I send the json block at the end of this post. How should the request be formatted? I cannot put my finger on it, and the API reference docs is not updated yet to provide the necessary info to work with this model.

error:

{
  "error": {
    "message": "[{'type': 'missing', 'loc': ('body', 'file'), 'msg': 'Field required', 'input': None}]",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }
}

request:

{
"messages": [
{ "role": "user", "content": [
{ "type": "input_audio", "input_audio": { "data": "//M4xAAUyDLYF0kQAFgc...VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV", "format": "mp3" } } ] } ],
"modalities": ["audio"],
"audio": { "voice": "echo", "format": "pcm16" },
"model": "gpt-4o-mini-transcribe", "temperature": 1.0
}

I spy your usage.
I see no clue what the “data” is. It has to be a file, like you try to indicate with “mp3”.
I see no clue where you are trying to send this.

You seem to have convoluted parameters for “Responses” with the transcription endpoint.

You’ll forgive me just pasting a Python async function for blasting parallel calls with files at the endpoint as multipart/form-data with a mime type and saving transcriptions serving here as an example of parameters for you:

import os
import asyncio
import httpx
import aiofiles
from typing import List

async def async_transcribe_audio_chunked(client: httpx.AsyncClient,
                                         input_file_path: str,
                                         output_file_path: str,
                                         rate_limit_semaphore: asyncio.Semaphore):
    url = "https://api.openai.com/v1/audio/transcriptions"
    headers = {
        "Authorization": f"Bearer {os.environ.get('OPENAI_API_KEY')}"
    }

    async with rate_limit_semaphore:
        print(f"Starting transcription for '{input_file_path}'")
        files = {
            'file': (input_file_path.split('/')[-1], open(input_file_path, 'rb'), 'audio/mpeg'),
            'model': (None, 'gpt-4o-mini-transcribe'),
            'language': (None, 'en'),
            'prompt': (None, "Welcome to our radio show."),
            'response_format': (None, 'json'),
            'temperature': (None, '0.2')
        }
        try:
            response = await client.post(url, headers=headers, files=files)
            response.raise_for_status()
            transcription = response.json()
        except Exception as e:
            if response.status_code == 429 and "check your plan" in response.text:
                print("Aborting due to exceeded quota.")
                return
            print(f"An API error occurred: {e}")
            return
        finally:
            files['file'][1].close()

        print(f"Sending complete for '{input_file_path}'")

    transcribed_text = transcription['text']

    try:
        async with aiofiles.open(output_file_path, "w") as file:
            await file.write(transcribed_text)
        print(f"--- Transcribed text successfully saved to '{output_file_path}'.")
    except Exception as e:
        print(f"Output file error: {e}")

async def main(input_file_paths: List[str]):
    rate_limit_semaphore = asyncio.Semaphore(40)  # Allow at least one request every 1.5 seconds
    output_file_paths = [path + "-transcript.txt" for path in input_file_paths]

    async with httpx.AsyncClient() as client:
        tasks = [
            async_transcribe_audio_chunked(client, input_file_path, output_file_path, rate_limit_semaphore)
            for input_file_path, output_file_path in zip(input_file_paths, output_file_paths)
        ]
        await asyncio.gather(*tasks)

if __name__ == "__main__":
    input_file_paths = ["audio1.mp3", "audio2.mp3"]
    asyncio.run(main(input_file_paths))

All right… Then it must be the actual file instead of its base64…
Seeing the Python code, I have got a better ideia of the request scheme… Indeed, I was just trying to use the same request body as is the request for the gpt-4o-mini-audio model. Thanks! I will give it a try adding only those params..

OK, just to leave the “solution” here. It works with curl with the example below:

curl -L 'https://api.openai.com/v1/audio/transcriptions' \
	-X POST \
	-H 'Authorization: Bearer sk-...' \
	-H 'Content-Type: multipart/form-data' \
	-F file='@/home/me/audio_in.mp3' \
	-F response_format='json' \
	-F model='gpt-4o-mini-transcribe' \
	-F temperature='0' \
	-o '/home/me/out.json'

A text prompt may be added with -F prompt="text". Cheers!

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.