Best practice for generating transcriptions from long audio files

mihailpet · May 15, 2024, 5:11am

I need to transcribe audio files of up to three hours in length. Should I wait for audio support to be added to the GPT-4o API? Or simply use the Whisper speech to text then clean up with GPT?

For both solutions, the audio file needs to be split into smaller chunks. The question is: How to seamlessly stitch together the resulting text chunks?

Thanks

Blade1024 · October 31, 2025, 12:22pm

How I do it:

you can (optionally) get everything into the wav for faster processing
identify noise levels and gaps in the recording for the possible splitting points
align split points to the maximum chunk size
split the source audio file to the chunks according to your mapping
convert audio into the 48K OGG mono for faster processing
transcribe each chunk (I do this in parallel to speed the process up)
cat them back together

This way I can process one hour of audio at around one minute.

I use the gpt-4o-mini-transcribe model, and it works brilliantly.

Topic		Replies	Views
GPT4.0-Transcribe—MAX 1500 SECONDS? API api	3	1074	July 4, 2025
How to transcribe long audio to srt file directly? API whisper	3	5305	December 16, 2023
Whisper API keeps returning empty transcript for videos longer than 30 minutes — stuck in production API api	1	70	May 3, 2026
GPT-4o-transcribe and audio model ready to use via API? API transcribe	10	4118	March 17, 2026
Send an hours worth of audio through Whisper using node.js API	7	947	December 11, 2023

Best practice for generating transcriptions from long audio files

Related topics