Whisper leaves out chunks of speech in longer transcript

christian.n.schmitz · April 12, 2024, 3:29pm

Dear community,

we want to transcribe speech of up to 1 hour. However, if we use the whisper model larger-v2 with the limitation of 25 mb (so we chunk the files), multiple parts of the speech with a length of 10 to 40 seconds each are missing. We noticed, that the length of these missing pieces also depends on the prompt which is necessarily rather long.
Do you have a solution for that problem?

Best,
Christian

_j · April 12, 2024, 4:21pm

Perhaps you misunderstand “prompt” on the transcription endpoint.

It is not instructions for the AI to follow. It is merely prior text that leads up to where the audio starts, to give the AI stronger hints about how to transcribe speech.

By re-encoding the audio to mono and to a lower voice-oriented codec, you can significantly increase the length you can send and reduce the network time.

However, missing audio may persist, in places where there is background music or noise that confuses whole internal chunks (the whisper AI operates on 30 second pieces of audio, and the endpoint itself uses techniques to join audio).

Silence detection and removal, splitting on silence, and reviewing the word time transcripts programmatically could let you discover gaps in transcriptions, and then re-submit smaller chunks that seem insufficient.

christian.n.schmitz · April 12, 2024, 5:27pm

Thanks for your help!

You are right about the prompt, but changing the prompt it still changes the frequency of the gaps.

Would you have a recommendation for input parameters in order to gain a more sensitive transcript with less gaps? Unfortunately, chunks that the model omitted had the best quality and were clearly understandable.

christian.n.schmitz · April 25, 2024, 6:20pm

By testing earlier version of whisper (such as 20230124), I noticed that these chunks are not left out in the transcript. As we are depending on the word-timestamps of the newer versions of whisper, is there any way to get an equally permissive behavior in newer versions of whisper?

canrobins13 · November 19, 2024, 12:58am

Hey Christian, I’m noticing similar behavior. What did you end up doing here? And what specific correlation did you notice between the prompt and the gaps?

christian.n.schmitz · November 19, 2024, 10:41am

Hey Canrobins,
we switched to whisper timestamped and tried to optimise the clarity of the speech with ffmpeg filters, that helped a lot!

aaditya1 · December 25, 2024, 12:52pm

Hey @christian.n.schmitz , @canrobins13
I’m facing the similar issue here : “whisper-asr-model-skipping-chunks-in-audio-transcription/1067744”
.Could you please let me know how do you resolve it ?

fynn.merlevede · March 5, 2025, 1:00pm

I’m facing the same issues. Example excerpt:

    {
      "word": "weiterhelfen",
      "start": 35.08000183105469,
      "end": 35.41999816894531
    },
    {
      "end": 60
    },
    {

      "word": "mir",
      "start": 60,
      "end": 60.15999984741211
    },

Any News or helpful advice?

Topic		Replies	Views
Whisper ASR Model Skipping Chunks in Audio Transcription Community whisper , transcribe	1	461	May 20, 2025
Whisper API skipping on parts of transcriptions API whisper	13	8143	December 27, 2024
Whisper AI doesn't transcribe Bugs whisper	4	2304	December 25, 2024
Gpt-4o-transcribe truncates the transcript API transcribe	12	1643	July 5, 2025
Whisper sometimes randomly skip sentence Bugs whisper	6	2097	April 18, 2025

Whisper leaves out chunks of speech in longer transcript

Related topics