I have seen many posts commenting on bugs and errors when using the openAI’s transcribe APIs (whisper-1). I also encountered them and came up with a solution for my case, which might be helpful for you as well.
This is my app’s workflow:
- Form (video) → Conversion to .mp3 → Upload to cloud storage → Return the ID of the created audio (used uploadThing service).
- Another form → Next.js backend searches for the video ID → retrieves the public video URL → Axios call to fetch the video from the URL → Buffer → openAI API.
Regarding the errors, I experienced a variety of them, and they were extremely inconsistent. Here are some that I was able to reproduce (and solve) again:
- Giant error (missing file format on name on the
toFile
helper)
{
"error": {
"status": 400,
"headers": {
"alt-svc": "h3=\":443\"; ma=86400",
"cf-cache-status": "DYNAMIC",
"cf-ray": "85421524f81042a8-BNU",
"connection": "keep-alive",
"content-length": "231",
"content-type": "application/json",
"date": "Mon, 12 Feb 2024 04:28:03 GMT",
"openai-organization": "user-b8nm4lwkpg28ajbxcqh28yus",
"openai-processing-ms": "29",
"openai-version": "2020-10-01",
"server": "cloudflare",
"set-cookie": "__cf_bm=0U9yxsAzNxER.utK5re7xYEvmv3TNfpb43LPHWIew0c-1707712083-1-AUpIGRf5Qn+ZLdtWukDSCnjPJKpB1vOiFl8Xx3WxMuhA0RMzzZTrzZ8hKftBB4ssVdFg2hwH4u9pAP4N3kC0aFw=; path=/; expires=Mon, 12-Feb-24 04:58:03 GMT; domain=.api.openai.com; HttpOnly; Secure; SameSite=None, _cfuvid=XFTY726zg.b3Sr4bWr8bYSo62GDtx278qVqg5kBHXi4-1707712083427-0-604800000; path=/; domain=.api.openai.com; HttpOnly; Secure; SameSite=None",
"strict-transport-security": "max-age=15724800; includeSubDomains",
"x-ratelimit-limit-requests": "50",
"x-ratelimit-remaining-requests": "49",
"x-ratelimit-reset-requests": "1.2s",
"x-request-id": "req_fcd02cbf6f1134c6ab00db5ed0b679d8"
},
"error": {
"message": "Unrecognized file format. Supported formats: ['flac', 'm4a', 'mp3', 'mp4', 'mpeg', 'mpga', 'oga', 'ogg', 'wav', 'webm']",
"type": "invalid_request_error",
"param": null,
"code": null
},
"code": null,
"param": null,
"type": "invalid_request_error"
}
}
- Cause: Missing name and format on the
toFile
helper:
{
"error": {
"status": 400,
"headers": {
"alt-svc": "h3=\":443\"; ma=86400",
"cf-cache-status": "DYNAMIC",
"cf-ray": "8542174bca4842a8-BNU",
"connection": "keep-alive",
"content-length": "231",
"content-type": "application/json",
"date": "Mon, 12 Feb 2024 04:29:32 GMT",
"openai-organization": "user-b8nm4lwkpg28ajbxcqh28yus",
"openai-processing-ms": "60",
"openai-version": "2020-10-01",
"server": "cloudflare",
"set-cookie": "__cf_bm=cbduUeL1pgq2WGoOCjh2NSV.WcQh4m_xg4a4VmjVW.I-1707712172-1-ATZ0dQkdAjGLdQtLEmw0n95RXGOgL2auzVOTVwNxxMBzY9Fsuum2DiQk8QJ/YmhP7AZsmnF7GimWnNip+cUwzaE=; path=/; expires=Mon, 12-Feb-24 04:59:32 GMT; domain=.api.openai.com; HttpOnly; Secure; SameSite=None, _cfuvid=eAHa9itrWU68c.1ELVZzvXLL8lVWH9KtoL9w72ekfUg-1707712172094-0-604800000; path=/; domain=.api.openai.com; HttpOnly; Secure; SameSite=None",
"strict-transport-security": "max-age=15724800; includeSubDomains",
"x-ratelimit-limit-requests": "50",
"x-ratelimit-remaining-requests": "49",
"x-ratelimit-reset-requests": "1.2s",
"x-request-id": "req_d098717e6ecb38fe0b11c466db5ee65b"
},
"error": {
"message": "Unrecognized file format. Supported formats: ['flac', 'm4a', 'mp3', 'mp4', 'mpeg', 'mpga', 'oga', 'ogg', 'wav', 'webm']",
"type": "invalid_request_error",
"param": null,
"code": null
},
"code": null,
"param": null,
"type": "invalid_request_error"
}
}
- Passing a buffer to the
file
attribute (could not get a propr message, so here is the wrong way to do it):
...
const transcription = await openAiApi.audio.transcriptions.create({
file: Buffer.from(data), //Here is the problem
model: 'whisper-1',
language: 'en',
response_format: 'json',
temperature: 0.1,
prompt,
});
...
So… How do I solved all of these errors?
Initially, I was doing this (Next.js api routes/node.js):
const file = await toFile(Buffer.from(data));
Somehow, I managed to make it work without the library, and then I asked myself: WHY?!
Here is the reason:
const file = await toFile(Buffer.from(data), 'audio.mp3');
This worked for both the node.js library and a pure fetch request.
I’m not really sure why, but I believe it has to do with the “name” inference in the “toFile()” method:
@param
name
— the name of the file. If omitted, toFile will try to determine a file name from bits if possible.
Note it says Try…
When I manually added the name and type, it worked properly. However, when I removed it, I started getting nonstop errors.
Partly, I think this happened because I am fetching the file from a URL on the internet hosted by an external service, and from there, I am creating a Buffer. Perhaps some information got lost in the process, and it probably shouldn’t be an issue if you simply retrieve a file that is submitted from a form on the frontend.
Here are my complete code, just for reference:
(It is a Next.ts 14 API route.ts, but must be the same for node.js)
- Using
npm openai
package
import { prisma } from '@/lib/prisma';
import { NextResponse } from 'next/server';
import axios from 'axios';
import { z } from 'zod';
import { openAiApi } from '@/lib/openai';
import { toFile } from 'openai/uploads';
const paramsSchema = z.object({
id: z.string().uuid(),
});
const bodySchema = z.object({
prompt: z.string(),
});
export async function POST(
request: Request,
context: { params: { id: string } }
) {
const body = await request.json();
try {
const { id } = paramsSchema.parse(context.params);
const { prompt } = bodySchema.parse(body);
const video = await prisma.video.findUniqueOrThrow({ where: { id } });
const { data } = await axios.get(video.path, {
responseType: 'arraybuffer',
});
const file = await toFile(Buffer.from(data), 'audio.mp3');
const transcription = await openAiApi.audio.transcriptions.create({
file: file,
model: 'whisper-1',
language: 'en',
response_format: 'verbose_json',
temperature: 0.1,
prompt,
});
return NextResponse.json({ id, prompt, transcription }, { status: 200 });
} catch (err) {
if (err instanceof z.ZodError) {
return NextResponse.json({ error: err.issues }, { status: 400 });
}
return NextResponse.json({ error: err }, { status: 500 });
}
}
- Creating a formData using “pure” fetch
import { prisma } from '@/lib/prisma';
import { NextResponse } from 'next/server';
import axios from 'axios';
import { z } from 'zod';
import path from 'path';
import { openAiApi } from '@/lib/openai';
import { FileLike, toFile } from 'openai/uploads';
import { Readable } from 'stream';
import fs from 'fs';
const paramsSchema = z.object({
id: z.string().uuid(),
});
const bodySchema = z.object({
prompt: z.string(),
});
export async function POST(
request: Request,
context: { params: { id: string } }
) {
const body = await request.json();
try {
const { id } = paramsSchema.parse(context.params);
const { prompt } = bodySchema.parse(body);
const video = await prisma.video.findUniqueOrThrow({ where: { id } });
const { data } = await axios.get(video.path, {
responseType: 'arraybuffer',
});
const file = await toFile(Buffer.from(data), "audio.mp3");
const trascribe = async () => {
const formData = new FormData();
formData.append('file', file as unknown as Blob);
formData.append('model', 'whisper-1');
formData.append('respnse_format', 'verbose_json');
formData.append('language', 'en');
const headers = new Headers();
headers.append('Authorization', `Bearer ${process.env.OPENAI_API_KEY}`);
console.log('>>> ~ trascribe ~ formData:', formData);
const response = await fetch(
'https://api.openai.com/v1/audio/transcriptions',
{
method: 'POST',
headers,
body: formData,
}
)
.then((res) => {
return res.json();
})
.catch((err) => {
console.log('>>> ~ err:', err);
});
return response;
};
return NextResponse.json({ id, prompt }, { status: 200 });
} catch (err) {
if (err instanceof z.ZodError) {
return NextResponse.json({ error: err.issues }, { status: 400 });
}
return NextResponse.json({ error: err }, { status: 400 });
}
}
I hope it helps u to solve some of your problems