How to avoid Hallucinations in Whisper transcriptions?

muddi900 · March 28, 2023, 8:16pm

Hello

I am testing a sample file(https://transfer.sh/klXWfe/sample.mp3). The transcription adds a few extra words, that are not present in the audio.

This episode is actually a co-production with another podcast called Digital Folklore, which is hosted by Mason Amadeus and Perry Carpenter. We’ve been doing a lot of our research together and our brainstorming sessions have been so thought-provoking, I wanted to bring them on so we could discuss the genre of analog horror together. So, why don’t you guys introduce yourselves so we know who’s who? Yeah, this is Perry Carpenter and I’m one of the hosts of Digital Folklore. And I’m Mason Amadeus and I’m the other host of Digital Folklore. And tell me, what is Digital Folklore? Yeah, so Digital Folklore is the evolution of folklore, you know, the way that we typically think about it. And folklore really is the product of basically anything that humans create that doesn’t have a centralized canon. But when we talk about digital folklore, we’re talking about…

The hallucination is emphasized.

How do I avoid it?

an.dy · March 28, 2023, 9:28pm

is it a function of the max tokens allowed or not revelant?

ltleijenhorst · March 28, 2023, 9:57pm

I am not a 100 percent sure if this is parameter available in the whisper API as well, but if it is you could try to turn the temperature parameter down.

muddi900 · March 29, 2023, 4:55pm

My temperature setting is the default for the python SDK, 0.

anon10827405 · March 29, 2023, 4:58pm

I have very little experience with Whisper but I have noticed a lot of people complaining about it hallucinating when there is little to no audio.

The only solution I could think of is to also note the strength of the current audio, and then use it as a filter on the end product. If there’s very little noise, it would be a fair assumption that there’s nothing to record, and that timeframe can be patched out.

muddi900 · March 29, 2023, 5:00pm

My sample is a clip from a professionally recorded podcast, with clear speaking voices.

anon10827405 · March 29, 2023, 5:25pm

There are no doubts regarding the quality of your recording.

It cuts abruptly however and I have noticed that the transcriptions are hallucinated when there is no noise. I imagine you’re trying to automate the transcriptions of these podcasts? Although I have very little experience with Whisper, my experience with GPT is that it will “try to make sense of nonsense”, which in this case is a sentence that is unfinished.

A couple seconds to verify the text certainly isn’t the end of the world…
You could maybe just take in consideration that it may hallucinate some words if the recording cuts abruptly. Again, going back to monitoring the current strength of the audio.

Maybe it is a safe assumption to say “If audio cuts abruptly → The last sentence may be corrupted”

In your example the last sentence is incomplete, why not just add a filter to check if the last sentence is complete or not?

curt.kennedy · March 29, 2023, 5:33pm

In my experience with Whisper, it has the lowest transcription error rate, but isn’t perfect. If you use alternatives like AWS Transcribe, you get a higher error rate, but it will at least separate out different speakers for you.

AFAIK, the only way to “prevent hallucinations” is to coach Whisper with the prompt parameter. Otherwise, expect it, and just about everything else, to not be 100% perfect.

But in my business, we switched to Whisper API on OpenAI (from Whisper on Huggingface and originally from AWS Transcribe), and aren’t looking back!

muddi900 · March 29, 2023, 5:39pm

I am making a user facing app, so my goal is to have some accuracy.

I do agree with @curt.kennedy. Whisper is very accurate, do I’ll just add a warning on my front end.

Thanks

anon10827405 · March 29, 2023, 5:51pm

If you are planning on commercializing whisper, this seems like a perfect opportunity to put yourself in a better position than your competitors. Rather than place a warning. I truly believe you can prevent this issue from occurring with just a little bit of elbow grease.

Usually, these kind of features can be expanded as well. If you are monitoring the strength of the audio, you can display it like Whatsapp and other messaging apps do when you create a voice note.

It would be very easy to anticipate a hallucinated ending based on the audio sample you have shown

curt.kennedy · March 29, 2023, 5:55pm

No, there will always be transcription errors! I think OpenAI says they expect a 95% rate in English, so 5% bad! Still better than the 70% you get everywhere else. AI is 95% perfect, not 100% perfect

The only thing you can do, is detect short files and send an error back to the user if the file is too short.

anon10827405 · March 29, 2023, 5:57pm

Completely. I was actually blown away when I saw that it’s more accurate with Spanish. Although after some conversing it made complete sense.

What I’m regarding is the hallucinations that occur from either cut audio, or moments of silence (which I’ve seen cause Whisper to hallucinate random sentences)

curt.kennedy · March 29, 2023, 6:15pm

@anon10827405

If we are talking mid-word cutoff hallucinations, then use pydub to segment it into <25M chunks, without cutting it mid-word (it can do this, not sure of the setting though) before sending to the API.

Otherwise, get the elbow grease out, and create this yourself, sure.

anon10827405 · March 29, 2023, 6:18pm

This is a very nice library. Thanks for the suggestion.

nikola1jankovic · April 2, 2023, 11:41pm

I am using ffmpeg to split files, but I don’t think it can recognize pauses for that. pydub sounds good, I will check it out.

How much are you using prompt to give instructions to it - and how much does it obey? I have just recently started sending a language of the audio file with it, not sure it helps. Also, it could create a problem if there were some sentences in other languages mixed in, not sure how it would work.

justin3 · May 11, 2023, 4:39pm

I am now getting fantastic results using prompts like the following :
prompt= (None,“you are a british speaker,please transcribe this into English for me.”
“This will never be in Welsh”
"Do not remove punctuation words like ‘dash’ or ‘new paragraph’

My issue was Whisper removing puntuation words which I process seperately using python code and also using th chat-gtp-4 API.

Whisper iteself did a crazily good thing last week. My user recorded a letter and finished with Best Wishes. She then said “oh sorry, add before best wishes thank you for coming to see me” Whisper transcribed this correctly without transcribing the 'oh sorry part". I couldn’t beleive it!

Jazz · July 6, 2023, 4:19pm

I was having very similar issues with cut off sentences. As @curt.kennedy mentioned, you can prompt it to your use case. This prompt worked perfectly for me:

“The sentence may be cut off, do not make up words to fill in the rest of the sentence.”

csilva · October 16, 2023, 12:17am

“The sentence may be cut off, do not make up words to fill in the rest of the sentence.”

you actually pass this in as the initial_prompt via API?

Bai_Lan_Blues · December 10, 2023, 2:17pm

Does that actually work? I thought the prompt was only to inform/influence the formatting of the text, or inclusion of stop-words and specific words/names.

My issue is currently that the end of the transcription is a massive repetition of a single character which completely floods my token limit. It’s infuriating.

The resulting transcription is so large that it seems to mess with the system message to gpt4 when translating. It mentions to omit any needless repetition of words, but that instruction is always ignored.

anon10827405 · December 10, 2023, 3:04pm

It shouldn’t work, considering that the model is not trained for instructions. But if it works it works

What’s your temperature set at?

Topic		Replies	Views
Whisper hallucination - how to recognize and solve? API whisper	25	19705	July 15, 2024
'Transcription Outsourcing, LLC' repeated throughout whisper transcript API api , whisper , hallucinations , audio	18	804	October 5, 2024
Whisper hallucinations + dropped sentences: Help? API whisper	3	3554	February 29, 2024
Mitigating Random Text Generation in OpenAI Whisper Transcriptions API whisper	0	860	May 30, 2024
Whisper ASR Model Skipping Chunks in Audio Transcription Community whisper , transcribe	1	392	May 20, 2025

How to avoid Hallucinations in Whisper transcriptions?

Related topics