Whisper API hallucinating on empty sections

uanandaraja · March 11, 2023, 3:00am

Anyone experience whisper hallucinating on empty sections? In my case, I’m dealing with audio/video in Indonesian, and usually when there’s an empty section at the beginning or the end, whisper will fill in something like “thanks for watching” or “sub by x”. is there anyway to prevent this? maybe with vad filter?

linus · March 11, 2023, 6:35pm

Hi @uanandaraja,

I had the same issue but with the regular text model. It tends to do that when it get’s the feeling that the provided content is not finished. What I have done is to specify this by appending a string variable in the API-Request where I stated that. Maybe you can do something similar with whisper?

I took a glance at the documentation and there it is stated that you can use an optional prompt to guide the way the model replies. There you can state the behavior you want to see.

Refer to the Link below (Documentation → Audio)

anon10827405 · March 11, 2023, 6:37pm

I think any sort of threshold would be a great way to filter out any noise caused by … no noise …

id.luchkin · April 29, 2023, 9:56pm

I have some success fighting this issue just processing the file through ffmpeg with a silenceremove command before sending the file to Whisper. Something like this: ffmpeg --fflags +discardcorrupt -y -i <file_name> -ar 8000 -af silenceremove=start_periods=1:stop_periods=-1:start_threshold=-30dB:stop_threshold=-30dB:start_silence=2:stop_silence=2. You would probably change the -ar (the sample rate) and some silenceremove flags depending on your audio, for that you can refer to this page.

thomaszero2882 · August 8, 2024, 11:59pm

I have been trying to search for someone else experiencing this issue for the longest!! I at least feel better knowing it’s just a bug with the whisper integration…

Something about it thinking it heard someone else saying a bunch of stuff I didn’t say was giving me the creeps… ("someone’s listening… …lol jk… mostly)

Adding ‘whisper’ to my searches yielded tonsss more results and I gotta say, I’m surprised at how many ppl are just sharing work arounds and don’t seem phased/perplexed whatsoever by this (IMO) very very odd behavior…

I feel like I can generally speculate at the technical causes of various bugs I notice in the wild (back end dev) but I can’t even begin to speculate at what would cause such a strange bug like this…

Here’s the one I got today:

“”"
Click now for my last video about ChatGPT-3. Click now for my last video about ChatGPT-3. Click now for my last video about ChatGPT-3. Click now for my last video about ChatGPT-3.
“”"

I’ve gotten this one a few times:
“Thank you for watching!”

…I’ve gotten them in other languages

But they all share a common trait which is they sound like they are the transcript to a video…

If anyone knows the reason this happens, I am beyond perplexed so deff @ me lol

turbolucius · August 9, 2024, 11:40am

It’s pretty much known at this point that OpenAI trained Whisper on YouTube videos, among other things (the legality of which is still up for debate). Regardless, this is why the model sometimes hallucinates these lines during silence.

In my uses of Whisper, I got the model to incorrectly output stuff like “Thanks for watching.” or even “Subtitles made by the community of Amara.org” when giving it a noisy input. As far as I know, there’s no easy way to reliably get rid of these during generation; all you can do is manually remove it afterwards or script it out using a keyword system.

jsbinette · August 15, 2024, 12:45am

I had the issue and it dissapeared (so far) by adding the language, even with a couple of second of no speech at the beginning of my file.

VanHamill · August 15, 2024, 9:01am

I’ve had the same ongoing experience with “thanks for watching”. it has occured in the android app (in speach to speach mode and in manual chat mode). “thankyou for watching” has even appeared in my chat box without my input obvi. i have also experienced in PC web based. Following the ‘whisper’ events, chatgpt has responded several different ways.

Sorry, i didn’t catch that,
You must be concluding a broadcast.
Sorry, i didn’t mean to interrupt.

I’ve often asked for clarification of the event and no recollection can me made. When i have communicated the event occurrence as a GPT glitch, i was quickly corrected with a defensive response stating the glitch was internal solely on my end (user)

white.moon0806 · March 15, 2025, 4:07am

The phrases I get on repeat: “Thanks for watching”, “Thank you very much”, “I don’t know, I dont know, I dont know” then it will cutoff the remaining voice dictation. I tried everything resolve this, changed apps, cleared caches, changed keyboards, phones, settings, mmm, I finally got the response that its related to Youtube traning videos as well and something Im stuck with for now, but at least I have solid answer for it now, cause it can be pretty off putting sometimes to see it creating its own unrelated voice input. Im still experiencing daily.

white.moon0806 · March 15, 2025, 5:37am

On top of these injected phrases, 100% of web search chatGPT executes returns with the statement “I hear you frustration”

Topic		Replies	Views
'Transcription Outsourcing, LLC' repeated throughout whisper transcript API api , whisper , hallucinations , audio	18	798	October 5, 2024
Weird whisper transcription links to FEMA.gov Bugs whisper	4	813	June 24, 2024
Whisper hallucinations + dropped sentences: Help? API whisper	3	3544	February 29, 2024
Whisper spitting out gibberish when trying to transcribe API whisper	4	1152	June 14, 2024
Hallucination on audio with no speech API whisper	7	7670	December 25, 2023

Whisper API hallucinating on empty sections

Related topics