What minimum bitrate should I use for whisper?

I’m transcribing files that are around 25MB—sometimes slightly bigger. Those currently have a 128k bit rate.

Instead of cutting the files into parts, I figured I might lower the bitrate instead. Or would that reduce the transcript quality?

On the open source whisper project, someone wrote that internally whisper is downsampling to 16k.

Is that the same for the whisper-1 model? Can’t find that in the docs.

If so, downsampling the input files couldn’t possibly harm the transcript quality—right?

1 Like

I came for this answer as well; how low can we reduce the quality?

If you can still understand it, the AI probably can also. Some of the training may also be on lossy compressed audio to match what you hear.

The exact levels of codec compressions aren’t clearly documented. For mp4/aac, the HE-AAC codec can be good for voice down to 16-24kbps, before it starts to sound swishy.