API Whisper transcriptions errors (SOLVED)

I have seen many posts commenting on bugs and errors when using the openAI’s transcribe APIs (whisper-1). I also encountered them and came up with a solution for my case, which might be helpful for you as well.

This is my app’s workflow:

  1. Form (video) → Conversion to .mp3 → Upload to cloud storage → Return the ID of the created audio (used uploadThing service).
  2. Another form → Next.js backend searches for the video ID → retrieves the public video URL → Axios call to fetch the video from the URL → Buffer → openAI API.

Regarding the errors, I experienced a variety of them, and they were extremely inconsistent. Here are some that I was able to reproduce (and solve) again:

  1. Giant error (missing file format on name on the toFile helper)
{
  "error": {
    "status": 400,
    "headers": {
      "alt-svc": "h3=\":443\"; ma=86400",
      "cf-cache-status": "DYNAMIC",
      "cf-ray": "85421524f81042a8-BNU",
      "connection": "keep-alive",
      "content-length": "231",
      "content-type": "application/json",
      "date": "Mon, 12 Feb 2024 04:28:03 GMT",
      "openai-organization": "user-b8nm4lwkpg28ajbxcqh28yus",
      "openai-processing-ms": "29",
      "openai-version": "2020-10-01",
      "server": "cloudflare",
      "set-cookie": "__cf_bm=0U9yxsAzNxER.utK5re7xYEvmv3TNfpb43LPHWIew0c-1707712083-1-AUpIGRf5Qn+ZLdtWukDSCnjPJKpB1vOiFl8Xx3WxMuhA0RMzzZTrzZ8hKftBB4ssVdFg2hwH4u9pAP4N3kC0aFw=; path=/; expires=Mon, 12-Feb-24 04:58:03 GMT; domain=.api.openai.com; HttpOnly; Secure; SameSite=None, _cfuvid=XFTY726zg.b3Sr4bWr8bYSo62GDtx278qVqg5kBHXi4-1707712083427-0-604800000; path=/; domain=.api.openai.com; HttpOnly; Secure; SameSite=None",
      "strict-transport-security": "max-age=15724800; includeSubDomains",
      "x-ratelimit-limit-requests": "50",
      "x-ratelimit-remaining-requests": "49",
      "x-ratelimit-reset-requests": "1.2s",
      "x-request-id": "req_fcd02cbf6f1134c6ab00db5ed0b679d8"
    },
    "error": {
      "message": "Unrecognized file format. Supported formats: ['flac', 'm4a', 'mp3', 'mp4', 'mpeg', 'mpga', 'oga', 'ogg', 'wav', 'webm']",
      "type": "invalid_request_error",
      "param": null,
      "code": null
    },
    "code": null,
    "param": null,
    "type": "invalid_request_error"
  }
}
  1. Cause: Missing name and format on the toFile helper:
{
  "error": {
    "status": 400,
    "headers": {
      "alt-svc": "h3=\":443\"; ma=86400",
      "cf-cache-status": "DYNAMIC",
      "cf-ray": "8542174bca4842a8-BNU",
      "connection": "keep-alive",
      "content-length": "231",
      "content-type": "application/json",
      "date": "Mon, 12 Feb 2024 04:29:32 GMT",
      "openai-organization": "user-b8nm4lwkpg28ajbxcqh28yus",
      "openai-processing-ms": "60",
      "openai-version": "2020-10-01",
      "server": "cloudflare",
      "set-cookie": "__cf_bm=cbduUeL1pgq2WGoOCjh2NSV.WcQh4m_xg4a4VmjVW.I-1707712172-1-ATZ0dQkdAjGLdQtLEmw0n95RXGOgL2auzVOTVwNxxMBzY9Fsuum2DiQk8QJ/YmhP7AZsmnF7GimWnNip+cUwzaE=; path=/; expires=Mon, 12-Feb-24 04:59:32 GMT; domain=.api.openai.com; HttpOnly; Secure; SameSite=None, _cfuvid=eAHa9itrWU68c.1ELVZzvXLL8lVWH9KtoL9w72ekfUg-1707712172094-0-604800000; path=/; domain=.api.openai.com; HttpOnly; Secure; SameSite=None",
      "strict-transport-security": "max-age=15724800; includeSubDomains",
      "x-ratelimit-limit-requests": "50",
      "x-ratelimit-remaining-requests": "49",
      "x-ratelimit-reset-requests": "1.2s",
      "x-request-id": "req_d098717e6ecb38fe0b11c466db5ee65b"
    },
    "error": {
      "message": "Unrecognized file format. Supported formats: ['flac', 'm4a', 'mp3', 'mp4', 'mpeg', 'mpga', 'oga', 'ogg', 'wav', 'webm']",
      "type": "invalid_request_error",
      "param": null,
      "code": null
    },
    "code": null,
    "param": null,
    "type": "invalid_request_error"
  }
}
  1. Passing a buffer to the file attribute (could not get a propr message, so here is the wrong way to do it):
...
 const transcription = await openAiApi.audio.transcriptions.create({
      file: Buffer.from(data), //Here is the problem
      model: 'whisper-1',
      language: 'en',
      response_format: 'json',
      temperature: 0.1,
      prompt,
    });
...


So… How do I solved all of these errors?

Initially, I was doing this (Next.js api routes/node.js):

 const file = await toFile(Buffer.from(data));

Somehow, I managed to make it work without the library, and then I asked myself: WHY?!

Here is the reason:

 const file = await toFile(Buffer.from(data), 'audio.mp3');

This worked for both the node.js library and a pure fetch request.

I’m not really sure why, but I believe it has to do with the “name” inference in the “toFile()” method:

@param name — the name of the file. If omitted, toFile will try to determine a file name from bits if possible.

Note it says Try

When I manually added the name and type, it worked properly. However, when I removed it, I started getting nonstop errors.

Partly, I think this happened because I am fetching the file from a URL on the internet hosted by an external service, and from there, I am creating a Buffer. Perhaps some information got lost in the process, and it probably shouldn’t be an issue if you simply retrieve a file that is submitted from a form on the frontend.

Here are my complete code, just for reference:
(It is a Next.ts 14 API route.ts, but must be the same for node.js)

  1. Using npm openai package
import { prisma } from '@/lib/prisma';
import { NextResponse } from 'next/server';
import axios from 'axios';
import { z } from 'zod';
import { openAiApi } from '@/lib/openai';
import { toFile } from 'openai/uploads';

const paramsSchema = z.object({
  id: z.string().uuid(),
});
const bodySchema = z.object({
  prompt: z.string(),
});
export async function POST(
  request: Request,
  context: { params: { id: string } }
) {
  const body = await request.json();
  try {
    const { id } = paramsSchema.parse(context.params);
    const { prompt } = bodySchema.parse(body);

    const video = await prisma.video.findUniqueOrThrow({ where: { id } });

    const { data } = await axios.get(video.path, {
      responseType: 'arraybuffer',
    });
    const file = await toFile(Buffer.from(data), 'audio.mp3');

    const transcription = await openAiApi.audio.transcriptions.create({
      file: file,
      model: 'whisper-1',
      language: 'en',
      response_format: 'verbose_json',
      temperature: 0.1,
      prompt,
    });

    return NextResponse.json({ id, prompt, transcription }, { status: 200 });
  } catch (err) {
    if (err instanceof z.ZodError) {
      return NextResponse.json({ error: err.issues }, { status: 400 });
    }
    return NextResponse.json({ error: err }, { status: 500 });
  }
}
  1. Creating a formData using “pure” fetch
import { prisma } from '@/lib/prisma';
import { NextResponse } from 'next/server';
import axios from 'axios';
import { z } from 'zod';
import path from 'path';
import { openAiApi } from '@/lib/openai';
import { FileLike, toFile } from 'openai/uploads';
import { Readable } from 'stream';
import fs from 'fs';

const paramsSchema = z.object({
  id: z.string().uuid(),
});
const bodySchema = z.object({
  prompt: z.string(),
});
export async function POST(
  request: Request,
  context: { params: { id: string } }
) {
  const body = await request.json();
  try {
    const { id } = paramsSchema.parse(context.params);
    const { prompt } = bodySchema.parse(body);

    const video = await prisma.video.findUniqueOrThrow({ where: { id } });

    const { data } = await axios.get(video.path, {
      responseType: 'arraybuffer',
    });
    const file = await toFile(Buffer.from(data), "audio.mp3");

    const trascribe = async () => {
      const formData = new FormData();
      formData.append('file', file as unknown as Blob);
      formData.append('model', 'whisper-1');
      formData.append('respnse_format', 'verbose_json');
      formData.append('language', 'en');
      const headers = new Headers();
      headers.append('Authorization', `Bearer ${process.env.OPENAI_API_KEY}`);
      console.log('>>>  ~ trascribe ~ formData:', formData);

      const response = await fetch(
        'https://api.openai.com/v1/audio/transcriptions',
        {
          method: 'POST',
          headers,
          body: formData,
        }
      )
        .then((res) => {
          return res.json();
        })
        .catch((err) => {
          console.log('>>>  ~ err:', err);
        });

      return response;
    };

    return NextResponse.json({ id, prompt }, { status: 200 });
  } catch (err) {
    if (err instanceof z.ZodError) {
      return NextResponse.json({ error: err.issues }, { status: 400 });
    }
    return NextResponse.json({ error: err }, { status: 400 });
  }
}

I hope it helps u to solve some of your problems

5 Likes

Very comprehensive post. I haven’t worked with Whisper API yet myself, but I hope this helps someone else having problems.

Thanks for sharing with us.

2 Likes

For me that was using this to generate an SRT return format the type is misleading.

export interface Transcription {
text: string;
}

It just returned a string directly…

Your welcome.