GPT4.0-Transcribe—MAX 1500 SECONDS?

NickPlanck · July 3, 2025, 8:23pm

I was really excited to use GPT 4.0 Transcribe instead of whisper but i just ran into a major roadblock and would appreciate some insight from some other people.

See image.

Im getting “400 audio duration 3189.204 seconds is longer than 1500 seconds which is the maximum for this model” which is really unfortunate because i spend some time already getting the file size down using FFMPEG command.

the 2 options as i see it are:

Option 1: Split Long Files

Break files into <25-minute chunks before uploading

Option 2: Use Whisper-1 Instead

Whisper-1 has no duration limit (just 25MB file size limit)

If there is no other option i will just choose option 2 because its less of a headache, unless GPT4.0 transcribe is way way way better. I see people online speeding up their audio files for cost purposes. I dont care about the cost but if i can do that so it can fit inside GPT transcribe, is it worth it? will speeding it up defeat the purpose of using GPT Transcribe instead of whisper for accuracy?

If speeding it up is actually a viable option, should i undo the ffmpeg that set all the files to

sampling rate to 16,000 Hz
audio bitrate to 32 kbps.

Keep in mind my longest audio file is 1 hour 16 minutes so even 2x speed wouldnt get it under 25 minutes.

Let me know if anyone sees a workaround for this please!

Sorry for any bad spelling or grammar,

Nick

jai · July 3, 2025, 9:29pm

Hi Nick,

If cost is not an issue, then the recommendation is to use the GPT 4.0 Transcribe model, as it “offers improvements to word error rate and better language recognition and accuracy compared to original Whisper models.” (Source: Model Card).

Technically you could speed up the audio, but as you highlighted, your longest audio file will still not fit with 2x speed. If it’s not too much work, it would be best to chunk the file yourself using any video/audio editing software and have GPT 4.0 Transcribe model transcribe each of those chunks. If speed is the need of the hour, you could make parallel API calls as well and stitch the text that you get back from each model together. In theory this would take the same amount of time as one API call made for sped-up audio file, so it’s only the cost piece that you would have to think about.

To start with it, you could try different models with say 3 one-minute long snippets sampled from different parts of your file and compare the results. There is also a GPT 4.0 mini Transcribe model that you can explore, and it might just fit the bill.

NickPlanck · July 4, 2025, 12:01am

Yes, I actually just designed a pre-processor thats going to keep the file size under 25MB and also speed it up by 3x so the max would be 75 minutes, if greater it will still process the file but it wont send it to the API, it will tell me that i need to manually chunk it, if it is less than 75 minutes it will send to GPT transcribe.

Thank you for the response!

_j · July 4, 2025, 12:05am

There’s also Whisper, which simply works, and doesn’t have a context window limit by repurposing a multimodal AI as its own endpoint. Its documentation is actually correct without "gotcha"s.

With Whisper, I’ve had over three hours of transcript returned, which you can transmit with optimized opus → ogg.

Topic		Replies	Views
Gpt-4o-transcribe audio length limits API	4	4794	May 27, 2025
Best practice for generating transcriptions from long audio files API	1	1118	October 31, 2025
Whisper API server error for long (not big) files API whisper	7	3945	December 18, 2023
Gpt-4o-transcribe truncates output after ~8-9 minutes even on short segments Bugs transcribe	3	401	August 29, 2025
Gpt-4o-transcribe truncates the transcript API transcribe	15	2769	August 29, 2025

GPT4.0-Transcribe—MAX 1500 SECONDS?

Related topics