Whisper API Limits - Transcriptions

I’m testing the whisper api to transcribe audio.
In the documentation it says that the file size limit is 25MB but if I try to send files larger than 10MB I get an error. This is my code snippet, I’m using nodejs

const resp = await openai.createTranscription(
fs.createReadStream(file),
“whisper-1”,
“”,
“verbose_json”,
0,
“is”
);

I’m a bit confused with the API limits, I would like to know how many requests can be made in parallel, if there is a limit of requests per minute, or transcription time, etc.

Something else, asking in the playground I read about version 2 of the whisper API, and that to use it I should only change whisper-1 to whisper-v2 in my code, but I get an error.

  1. It’s difficult to troubleshoot without the error output, please provide one.
  2. You can read about rate limits here.
  3. You might have seen large-v2 model mentioned on Whisper’s GitHub page. In my understanding, it is equivalent to whisper-1, which is currently the only model being used by the API and doesn’t have any alternatives yet.
1 Like

I’ve been having APIConnectionError all day. Connection keeps aborting. Was working fine last night. Might be a Whisper thing right now.

I’m only versed in Python myself, but I’d like to shine some light on the API limits.

The only limit that the API has it that it only accepts files up to 25 MB. There is no limits on duration. If you degrade the bitrate of an audio file, you’ll be able to send in longer and longer conversations. Though, you should note that the lower the quality of the file, the less accurate and reliable the transcription might be, of course.

Something is wrong with your snippet. A 10 MB file should go through just fine. I have just sent in a 17.9 MB file using Python.

@josh8 The Whisper API was experiencing degraded performance on 04-28-2023.

Nah, my code makes sure that the files are under 15MB each. I split them on silent periods using Pydub. I switched to Deepgram yesterday and there’s no limit on file size and it’s done in about 1/20th of the time. I think they’re using TPUs + Whisper Jax. They’re using Whisper’s large model.

1 Like

I don’t know what your usage is for the transcribed text but if you’re dealing with subtitle files by any chance, how do you deal with the timestamps being in-accurate since each chunk starts at 0:00:00?

I’ve had issues using streaming with the Whisper API in the past. Try to send the binary non-streamed file instead.

Due to a programming error, I reached $120 limit in under 90 minutes → and now my account can’t even call the any other APIS, not even GPT3-5… And the tech support is just simply ignoring you, they dont even read your messages => you get back the same genaral support answer.
So be careful with the whisper-1 api, the rate limit on it wont stop to overuse your budget for the month, so if you make a mistake, than good luck to find a new account.

That’s quite alarming. Have you experienced this again?

Although it’s quite alarming and it is unfortunate, I think of it the same way as filling up your cars fuel tank with diesel instead of petrol, bad as that is, you still need to pay for the fuel you pumped, even if it was by accident.

Oh, I think my initial understanding was not correct, I was under the impression that programming error is from OpenAI side, where the system made a bad calculation. So I was worried. But I see now the error was actually on his side, so maybe his script/app went into a loop or something, too bad.

Ok, I feel more comfortable now. :slight_smile: