Whisper API, increase file limit >25 MB

JFixby · December 20, 2023, 6:41pm

I’m currently using the Whisper API for audio transcription, and the default 25 MB file size limit poses challenges, particularly in maintaining sentence continuity when splitting files.

By default, the Whisper API only supports files that are less than 25 MB. If you have an audio file that is longer than that, you will need to break it up into chunks of 25 MB’s or less or used a compressed audio format. To get the best performance, we suggest that you avoid breaking the audio up mid-sentence as this may cause some context to be lost.

Given that the accurate transcription of lengthy audio files requires splitting them, using an external library isn’t feasible without prior transcription.

As the primary purpose of the service is transcription, I’m seeking information on increasing the file size limit to avoid disrupting the natural flow of sentences.

Could you provide guidance on this or share any plans for future updates addressing this limitation?

Is it possible simply to increase the limit?

Your assistance is greatly appreciated.

_j · December 20, 2023, 8:02pm

It is possible to increase the limit to hours by re-encoding the audio.

As the primary purpose of the service is transcription, you can use voice codec and bitrate.

For example, a command to get exactly what you want.

ffmpeg -i audio.mp3 -vn -map_metadata -1 -ac 1 -c:a libopus -b:a 12k -application voip audio.ogg

Opus is one of the highest quality audio encoders at low bitrates, and is supported by Whisper in ogg container.

Silence detection: also a useful tool.

JFixby · December 21, 2023, 10:36pm

This is amazing, man. Thank you. Solved my specific problem.

However would be nice to have a more general purpose solution for unlimited size audios in the future

david.lord.butler · December 21, 2023, 10:42pm

I have an idea. You could overlap the audio files and then post-process the transcriptions through gpt-4 to remove redundancy and create a complete whole.

JFixby · December 21, 2023, 11:23pm

That sounds like a good idea at first.

But building and maintaining a library to split audio files with overlap takes up a lot of time and resources. Doing this for every programming language is even a bigger hassle and adds unnecessary complexity. Putting sentences back together after splitting becomes tricky. The flow is disrupted, and it’s not easy to maintain continuity. There is no guarantee overlapping segments will match in the resulting outputs.

ds.studio · March 2, 2024, 6:37pm

Umm, this was CLUTCH. I was able to take a roughly 1.2g mp4 into an 9mb audio file

kaiwalyapatil · March 20, 2024, 5:12pm

Crazy stuff!
Thanks a tonne!

mlheyd · March 27, 2024, 7:12pm

I have never used ffmpeg before. I will use the statement above in a Python file which calls the OpenAI API. Two questions:

Can I use an m4a file?
I don’t understand the statement. Is it a system setting, or do I need to do something like myModifiedAudioFile = ffmpeg -i audio.mp3 -vn -map_metadata -1 -ac 1 -c:a libopus -b:a 12k -application voip audio.ogg? If so which is the input filename? I would really appreciate some direction on this.

kaiwalyapatil · March 27, 2024, 8:00pm

If you are using the command mentioned above, stick to ogg. Because .m4a would mean a different way of compression, resulting into different parameters or so. If you application doesn’t permit this, use ogg for transcription and save the .m4a for other analysis.
It’s a command. Just like we have ‘pip install…’ in python. There are different ways to execute a command from a .py script. The input file name is “audio.ogg” you can change the name, add directory, whatever is your requirement.

mlheyd · March 28, 2024, 2:15am

Thanks very much for your reply. I assumed audio.mp3 was the input filename but wasn’t positive. I am recording system audio on Windows 10 and the only Windows option is an m4a file. Do you think it would degrade the audio to convert from m4a to mp3 and then to ogg?

kaiwalyapatil · March 28, 2024, 1:15pm

Not really, I mean it will change in the way audio is, but when we look at it from Transcription/Translation pov, it won’t matter.
But again, if you have an audio sensitive application, please keep the original file stored.

_j · March 28, 2024, 1:26pm

Yes, you can send and reprocess m4a, which is just mp4 that Apple renamed.

m4a generated directly by Apple device backend can fail outright due to problems with their encoding and codecs and recognition by Whisper API, many have discovered.

mp4 is a container that can contain multiple streams, and can be demuxed instead of re-encoded to get just the first audio stream into a new mp4. It will typically have AAC.

Making audio streams more efficient means re-encoding though.

FFMPEG is adaptive to the input file type, and it is only if you specify very specific parameters that don’t match the input that it will fail. The options of the command I wrote above include discarding of extra streams and metadata, summing to mono, and then encoding to the efficient passband voice codec Opus settings.

mlheyd · March 29, 2024, 1:23am

Thanks for your reply. I just changed the audio file extension from m4a to mp4 and ran you command. I worked like a charm. I got about a 14 X reduction in file size.

mlheyd · April 10, 2024, 6:52pm

I just upgraded to python 3.9.1. Now I keep getting the error
“Unable to choose an output format for ‘ffmpeg’; use a standard extension for the filename or specify the format manually.
[out#0 @ 000001adb02c8f00] Error initializing the muxer for ffmpeg: Invalid argument
Error opening output file ffmpeg.
Error opening output files: Invalid argument”

It’s making me crazy(ier)

_j · April 10, 2024, 7:06pm

Depending on the standalone version of ffmpeg that is being used and the OS, you may, like the error states, need to manually specify the output format container as -f ogg if you are writing your own output file extension.

FFMPEG also needs to be new enough and be compiled with OGG and Opus support.

mlheyd · April 10, 2024, 7:55pm

I downloaded new ffmpeg here.

ffmpeg-master-latest-win64-gpl-shared.zip.
I am in Windows10 using VSCode.

“D:\VS Code Projects\ffmpegdir\bin\ffmpeg.exe” ffmpeg -i input.mp3 -vn -map_metadata -1 -ac 1 -c:a libopus -b:a 12k -application voip -f ogg audio.ogg

What do you think?
BTW, I really appreciate your help on this. It’s frustrating because I am so clueless about this.

_j · April 10, 2024, 8:29pm

“master” means testing version with bleeding-edge changes.
“gpl” means no contentious proprietary or patented software codecs.

Here’s a site, where you’ll want to get a “full-build” “release” version of the ffmpeg exe, (from which a 2022-01-10 build is what’s running on my system).

https://www.gyan.dev/ffmpeg/builds/#release-builds

I could ZIP up that 2022 version if you want an exe from well before anyone would have thought to include an OpenAI API key stealer into a random exe. But then you’d have to trust me…

mlheyd · April 10, 2024, 8:56pm

I downloaded FFMpeg from the link you suggested. Thanks for that. When I run the following command, it reads all the meat data just fine. But I still get the same error. As I say, it worked just fine in python 3.8.3. I’ll try that but changing the interpreter in VSCode doesn’t do a thing. I have to change the system path and reboot. Any other advice before I do that?
“D:\VS Code Projects\ffmpeg-7.0-essentials_build\bin\ffmpeg.exe” ffmpeg -i myAudio.mp3 -vn -map_metadata -1 -ac 1 -c:a libopus -b:a 12k -application voip -f ogg audio.ogg

Stream #0:0: Audio: mp3 (mp3float), 44100 Hz, mono, fltp, 128 kb/s
[AVFormatContext @ 00000248315d1640] Unable to choose an output format for ‘ffmpeg’; use a standard extension for the filename or specify the format manually.
[out#0 @ 00000248315d1140] Error initializing the muxer for ffmpeg: Invalid argument
Error opening output file ffmpeg.
Error opening output files: Invalid argument

mlheyd · April 10, 2024, 8:57pm

What time zone are you in? Will you be around later this evening or tomorrow?

_j · April 10, 2024, 9:05pm

If you have a system Python installed, you can just run your .py directly, or by opening the file in IDLE 3.9, and picking “run” to execute in its print-shell. You can try that to find out if VSCode doesn’t want to trust your interpreter or external binaries or is not piping what you think it is.

Run the ffmpeg command line in a cmd.com shell simply to ensure that it alone will encode a file.

Topic		Replies	Views
How do I get whisper to allow larger files in the request? Bugs whisper	2	1609	December 26, 2023
Whisper API server error for long (not big) files API whisper	7	2495	December 18, 2023
Whisper: Maximum content size limit exceeded API whisper	9	11546	December 18, 2023
Whisper API, How to upload file that larger than 25mb API api , feature-request	4	520	March 22, 2024
Whisper API Limits - Transcriptions API whisper	11	8618	December 18, 2023

Whisper API, increase file limit >25 MB

Related Topics