Whisper API, increase file limit >25 MB

I ran this in command shell. It confirms that libopus is enabled. Anything missing that you can see?

Microsoft Windows [Version 10.0.19045.4170]
(c) Microsoft Corporation. All rights reserved.

C:\WINDOWS\system32>d:

D:>cd D:\VS Code Projects\AI Transcripts - Formatting - Summaries-Pdfs

D:\VS Code Projects\AI Transcripts - Formatting - Summaries-Pdfs>“D:\VS Code Projects\ffmpeg-7.0-essentials_build\bin\ffmpeg.exe” ffmpeg -i test_C3.mp3 -vn -map_metadata -1 -ac 1 -c:a libopus -b:a 12k -application voip -f ogg audio.ogg
ffmpeg version 7.0-essentials_build-www.gyan.dev Copyright (c) 2000-2024 the FFmpeg developers
built with gcc 13.2.0 (Rev5, Built by MSYS2 project)
configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-libfreetype --enable-libfribidi --enable-libharfbuzz --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-dxva2 --enable-d3d11va --enable-d3d12va --enable-ffnvcodec --enable-libvpl --enable-nvdec --enable-nvenc --enable-vaapi --enable-libgme --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-librubberband
libavutil 59. 8.100 / 59. 8.100
libavcodec 61. 3.100 / 61. 3.100
libavformat 61. 1.100 / 61. 1.100
libavdevice 61. 1.100 / 61. 1.100
libavfilter 10. 1.100 / 10. 1.100
libswscale 8. 1.100 / 8. 1.100
libswresample 5. 1.100 / 5. 1.100
libpostproc 58. 1.100 / 58. 1.100
[mp3 @ 000001dff39d4bc0] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from ‘test_C3.mp3’:
Duration: 00:32:49.74, start: 0.000000, bitrate: 128 kb/s
Stream #0:0: Audio: mp3 (mp3float), 44100 Hz, mono, fltp, 128 kb/s
[AVFormatContext @ 000001dff39de2c0] Unable to choose an output format for ‘ffmpeg’; use a standard extension for the filename or specify the format manually.
[out#0 @ 000001dff39de1c0] Error initializing the muxer for ffmpeg: Invalid argument
Error opening output file ffmpeg.
Error opening output files: Invalid argument

D:\VS Code Projects\AI Transcripts - Formatting - Summaries-Pdfs>

In that command line, you appear to be invoking ffmpeg.exe, and then passing the program name again as command-line parameter…

No that can’t be it. It has to be something wrong with ffmpeg, libopus, windows, vscode, or something else other than something dumb I did.

OMG, I looked at that line a hundred times and never noticed I was starting ffmpeg twice. I get 1/2 life demerit point for that.

Great eye. Thanks so very much for your help.

By the way just for learning purposes I have to confess that the following line might as well have written in Hindi. If you have a moment can you offer a little explanation. If not, I totally understand as you have been very generous with your time.
“VSCode doesn’t want to trust your interpreter or external binaries or is not piping what you think it is.”

P.S. Didn’t know about IDLE either.

VSCode interrupts your workflow with unexpected warnings…

Here’s a page about code trust: Visual Studio Code Workspace Trust security

Basically, "Workspace Trust provides an extra layer of security when working with unfamiliar code, by preventing automatic code execution of any code in your workspace if the workspace is open in “Restricted Mode”.

or rephrased:

“this could make arbitrary binaries not work right”

Got it. Thanks. Since we are on the “ogg” audio file subject I’ll ask another question that is preventing me from making any progress. I can post a different question if you think that would be better.
I am getting a 429 exceeded quota error when submitting an ogg file that is only 1.6kb. There is about $15 in my account- enough for testing. I have only submitted a few requests in the last hour- under 5 requests. Any idea what can be done?

Check status - All Okay

The first thing to do is see if your account works.

Go to the playground and submit a chat completion.

https://platform.openai.com/playground/p/wRiau0i6hfCk8wooVW3CJRJc?model=gpt-4-turbo&mode=chat

Grab a few lines from that playground preset user message, and run in your Python environment to see what API key is being used (often forgotten):

# Gets the API key from environment variable
api_key = os.getenv("OPENAI_API_KEY")
headers = {"Authorization": f"Bearer {api_key}"}
print(headers)  # show that you are using a valid key

Match up the key that was printed with your API account organization that has an available credit balance. In API keys, match up the last digits, and ensure the key has “all permissions”.

Ensure you haven’t set a premature monthly account limit.

Then the playground preset’s has full audio transcription code I wrote. You can copy that and run it after updating the input file name, like I just did on a freshly created ogg file with your ffmpeg command line to see the API is working. Review the two files it saves with both a transcript, and word timestamp version, OR the full error message.

One of these should tell you the place things went wrong. Whisper has a limit of 50 requests PER MINUTE at the lowest tier - that’s not going to be the problem.

To err is human, to really mess things up, divine.

API is working; my transcription.txt

Hey, hey, hey! Time for a few fart jokes! Where would a comedy show be without a few fart jokes? Question. Did you ever have to fart on a bus or an airplane or in some public place? But you hadn’t been farting all that day. So you didn’t really know the nature of the beast. You only knew there was lots of it. In a situation like that, what you have to do is to release a test fart. You have to arrange to release, quietly and in a carefully controlled manner, about 10 to 15 percent of the total fart. In order to determine if those around you can handle it. Or, if in fact you may be about to precipitate, a public health emergency. When releasing a test fart, it is often good to engage in an act of subterfuge, such as reaching for a magazine. Say, is that golf digest? That doesn’t smell too horrifying. In fact, in an odd way, it’s rather pleasant. I think they ought to enjoy the rest of this baby. And it turns out to be one of those farts that would strip the varnish off a footlocker. A fart that could end a marriage. And everyone around you heads for the exits. Even the people on the airplane. As you realize, it is time to review your fiber intake. It might not be necessary, after all, each morning to eat an entire wicker swing set. I have no ending for this, so I take a small bow. Thank you. I appreciate that. Thank you. Thank you. OK.

Thanks a bunch. I have to stop for the day but needless to say I’ll be on it in the am.

I created a new API key and all is well. Thanks for your help.

I saw on the forum where this happened to someone else as well. Do you know why this is happening? Just OpenAI growing pains? I definitely don’t want this happening in production.

Hey guys, just wanted to chime in here to check if any of you are currently experiencing the same issues as me when it comes to NodeJS and Whisper. So I’ve converted the file to .ogg which ended up saving me a lot of hassle having to previously split audio chunks into separate files. But for the last two days now I’ve been getting errors on “larger” .ogg files (16MB and 18MB to be precise).

I’ve tried to use both node-fetch that threw this error:

{
"error": {
"message": "The server had an error processing your request. Sorry about that! You can retry your request, or contact us through our help center at help.openai.com if you keep seeing this error. (Please include the request ID xx in your email.)",
"type": "server_error",
"param": null,
"code": null
}
}

And when using the openai library I get the following:

APIConnectionError: Connection error.
at OpenAI.makeRequest (/usr/src/app/node_modules/openai/core.js:292:19)
at async exports.performTranscription (/usr/src/app/utils/performTranscription.js:13:18)
at async exports.performTranscription (/usr/src/app/transcribe.js:473:26)
status: undefined,
headers: undefined,
error: undefined,
code: undefined,
param: undefined,
type: undefined,
cause: FetchError: request to https://api.openai.com/v1/audio/transcriptions failed, reason: read ECONNRESET
at ClientRequest. (/usr/src/app/node_modules/node-fetch/lib/index.js:1501:11)
at ClientRequest.emit (node:events:518:28)
at TLSSocket.socketErrorListener (node:_http_client:500:9)
at TLSSocket.emit (node:events:530:35)
at emitErrorNT (node:internal/streams/destroy:169:8)
at emitErrorCloseNT (node:internal/streams/destroy:128:3)
at process.processTicksAndRejections (node:internal/process/task_queues:82:21)
type: ‘system’,
errno: ‘ECONNRESET’,
code: ‘ECONNRESET’

I can swear this worked a week ago. Anyone else experiencing the same issues as me? For the record - the mp3 file is 1 hour and 40 min long. FYI; Over $100 dollars available on the account.

Smaller mp3/mp4/wav files seems to work just fine with the exact same code.

Edit: Created a thread here: Whisper API fails on "large" ogg files (still below 25MB)

1 Like