Gpt-4o-transcribe audio length limits

Hello,

I am excited to use gpt-4o-transcribe. For my existing workflows, I convert my files into ogg format and ask whisper-1 to transcribe. However, I noticed that the same files get an error when I use “gpt-4o-transcribe”

The error message is

    message: 'audio duration 1800.024 seconds is longer than 1500 seconds which is the maximum for this model',

Are there any other constraints (file type, file size etc)? For example, is the file limit same as 25Mbs.

2 Likes

I think this thread from last year might be a bit outdated, but might lead to some clues?

I’d also check the docs…

https://platform.openai.com/docs/models/gpt-4o-transcribe

Good luck!

1 Like

I figured out the issue.

When the files are greater than 25mb, I split them using ffmpeg. However, I did not reset the timestamps on the new audio segments that got generated. “gpt-4o-transcribe” relies on this metadata to determine the duration of the file while “whispher-1” doesn not.

For example, if I split a 30 minute file into 3 10 minutes audio segments, the duration metadata was carried over from the original unsplit audio file.
e.g.
Duration: 00:04:59.04, start: 0.000000, bitrate: 129 kb/s
(highlighting the Duration metadata here)
So even though the split segment are 10 minute each (600 seconds), the metadata may say the duration is “1800 seconds” because it carried this information over.

Solution: Set the '-reset_timestamps', '1' flag when splitting the audio using ffmpeg so that the original duration metatdata is not carried over.

1 Like

Thanks for coming back to share with the rest of the community!

1 Like