I’m testing the whisper api to transcribe audio.
In the documentation it says that the file size limit is 25MB but if I try to send files larger than 10MB I get an error. This is my code snippet, I’m using nodejs
I’m a bit confused with the API limits, I would like to know how many requests can be made in parallel, if there is a limit of requests per minute, or transcription time, etc.
Something else, asking in the playground I read about version 2 of the whisper API, and that to use it I should only change whisper-1 to whisper-v2 in my code, but I get an error.
You might have seen large-v2 model mentioned on Whisper’s GitHub page. In my understanding, it is equivalent to whisper-1, which is currently the only model being used by the API and doesn’t have any alternatives yet.
I’m only versed in Python myself, but I’d like to shine some light on the API limits.
The only limit that the API has it that it only accepts files up to 25 MB. There is no limits on duration. If you degrade the bitrate of an audio file, you’ll be able to send in longer and longer conversations. Though, you should note that the lower the quality of the file, the less accurate and reliable the transcription might be, of course.
Something is wrong with your snippet. A 10 MB file should go through just fine. I have just sent in a 17.9 MB file using Python.
Nah, my code makes sure that the files are under 15MB each. I split them on silent periods using Pydub. I switched to Deepgram yesterday and there’s no limit on file size and it’s done in about 1/20th of the time. I think they’re using TPUs + Whisper Jax. They’re using Whisper’s large model.