Issue with speech-to-text MP3 size

Hi Everyone,

We have been testing OpenAI speech-to-text. So far, we have had success testing 4-5MB MP3s with several languages (e.g. Arabic, Chinese, Greek, Hindi, Japanese, Korean, and Russian). We have our own MP3 voice recorder that works quite well - generates 320kbps bitrate files.

However, we are getting error code 400 “Bad requst” when submitting larger MP3s which are under the 25MB limit (i.e 23.5MB).

So, what gives? Is this a bug? Is OpenAI overstating the max size of 25MB?

1 Like

Hi! I have a WhatsApp bot with 20k+ users that uses Whisper to transcribe voice notes. It’s been public for one year without any major issues.

What has happened to me during tests with large files is that I was incorrectly computing the file size. I thought it was less than 25MB but that was because the byte counting was not correct.

So some questions:

  • Can you replicate that error with a simple curl to Whisper?
  • Can you replicate that error with another file format of the same size?
  • Can you replicate that error with an MP3 of the same size from another source?
  • Are you doing any extra processing before submitting the file?
  • Is it possible for you to share your full input and output?

Thank you for your response.

It is important to note that we are using a Windows desktop application. As mentioned above, we have our own proprietary MP3 voice recorder | playback tool that we have been using for years for projects. As such, we can use Windows APIs to determine the exact byte count and duration of a recorded MP3. See below for a modification used for OpenAI STT:

The above screen shot shows a successful MP3 recoding made in English. However, when submitted to OpenAI, it immediately returned error code 400 “Bad requst”.

However, when we lowered and re-recorded to 14,059,147 bytes (13.4 MB) ; 05:51 duration; 320 bitrate, the respsonse from OpenAI was a perfect success.

This is the same story with our test recordings in other languages. There seems to be no way to get even close to 25MB.

1 Like

To be clear, OpenAI is doing an incredible job with STT. And we believe that they will scale the MB limit over time. The problem is that, unless a hard MB limit is known, error code 400 will be a big frustration for a lot of people - like us…

We spent a lot of time testing different languages in addition to testing MB limits - our recording progress bar max is set to 26164400 bytes (just under 25MB). We are now testing to find a common hard limit with different languages so we can modify the progress bar max and avoid error code 400.

Thanks for the details. I’ll check with my own implementation, files close to 25MB. I’ve done that before without issues. Actually, one of the things my bot does is trimming large files into chunks smaller than 25MB but most are probably ~15MB. I’ll get back to you after I try.

25MB = 25,000,000 bytes → what is documented
25MiB = 25 x 1024 x 1024 bytes

However, a previous user noted that when sending extremely compressed audio that facilitated long transcription times, the limit where an error was received was closer to 16-18MB. That may be a second restriction on the audio length, or that the input file size is not as advertised. I didn’t pay the $0.36 an hour to find out if you were to work backwards from error at what threshold success is reached.

Well, we tweeked the MP3 recording config: set bitrate to 160 instead of 320; set to mono instead of stereo. This effectively cuts the MP3 file size in half for a given recording.

We also determined a common hard limit for our recordings to be 16,276,000 bytes (13.33 duration) - anything much above this results error code 400.

1 Like