Whisper hallucination - how to recognize and solve?

anon10827405 · July 28, 2023, 5:20pm

I mean… maybe it could lead to insanity? Not sure. But, LLMs can also enter this situation, usually as a result of “greedy decoding”.

But again, that’s why Whisper uses Beam Search and why a temperature of 0 can help prevent that issue (slightly counter-intuitive).

Keep in mind that Whisper uses a timestamp-based sliding context window as well.

Whisper relies on accurate prediction of the timestamp tokens to determine the
amount to shift the model’s 30-second audio context window by, and inaccurate transcription in one window may
negatively impact transcription in the subsequent windows.
We have developed a set of heuristics that help avoid failure cases of long-form transcription, which is applied in
the results reported in sections 3.8 and 3.9. First, we use
beam search with 5 beams using the log probability as the
score function, to reduce repetition looping which happens
more frequently in greedy decoding. We start with temperature 0, i.e. always selecting the tokens with the highest probability, and increase the temperature by 0.2 up to
1.0 when either the average log probability over the generated tokens is lower than −1 or the generated text has a
gzip compression rate higher than 2.4. Providing the transcribed text from the preceding window as previous-text
conditioning when the applied temperature is below 0.5
further improves the performance

Topic		Replies	Views
How to avoid Hallucinations in Whisper transcriptions? API whisper	32	21743	May 1, 2025
Hallucination on audio with no speech API whisper	7	7421	December 25, 2023
Whisper hallucinations + dropped sentences: Help? API whisper	3	3445	February 29, 2024
'Transcription Outsourcing, LLC' repeated throughout whisper transcript API api , whisper , hallucinations , audio	18	681	October 5, 2024
Mitigating Random Text Generation in OpenAI Whisper Transcriptions API whisper	0	838	May 30, 2024

Whisper hallucination - how to recognize and solve?

Related topics