I’m testing the whisper api to transcribe audio.
In the documentation it says that the file size limit is 25MB but if I try to send files larger than 10MB I get an error. This is my code snippet, I’m using nodejs
You might have seen large-v2 model mentioned on Whisper’s GitHub page. In my understanding, it is equivalent to whisper-1, which is currently the only model being used by the API and doesn’t have any alternatives yet.
I’m only versed in Python myself, but I’d like to shine some light on the API limits.
The only limit that the API has it that it only accepts files up to 25 MB. There is no limits on duration. If you degrade the bitrate of an audio file, you’ll be able to send in longer and longer conversations. Though, you should note that the lower the quality of the file, the less accurate and reliable the transcription might be, of course.
Something is wrong with your snippet. A 10 MB file should go through just fine. I have just sent in a 17.9 MB file using Python.
Nah, my code makes sure that the files are under 15MB each. I split them on silent periods using Pydub. I switched to Deepgram yesterday and there’s no limit on file size and it’s done in about 1/20th of the time. I think they’re using TPUs + Whisper Jax. They’re using Whisper’s large model.