Anyone experience whisper hallucinating on empty sections? In my case, I’m dealing with audio/video in Indonesian, and usually when there’s an empty section at the beginning or the end, whisper will fill in something like “thanks for watching” or “sub by x”. is there anyway to prevent this? maybe with vad filter?
Hi @uanandaraja,
I had the same issue but with the regular text model. It tends to do that when it get’s the feeling that the provided content is not finished. What I have done is to specify this by appending a string variable in the API-Request where I stated that. Maybe you can do something similar with whisper?
I took a glance at the documentation and there it is stated that you can use an optional prompt to guide the way the model replies. There you can state the behavior you want to see.
Refer to the Link below (Documentation → Audio)
I think any sort of threshold would be a great way to filter out any noise caused by … no noise …
I have some success fighting this issue just processing the file through ffmpeg with a silenceremove
command before sending the file to Whisper. Something like this: ffmpeg --fflags +discardcorrupt -y -i <file_name> -ar 8000 -af silenceremove=start_periods=1:stop_periods=-1:start_threshold=-30dB:stop_threshold=-30dB:start_silence=2:stop_silence=2
. You would probably change the -ar
(the sample rate) and some silenceremove
flags depending on your audio, for that you can refer to this page.
I have been trying to search for someone else experiencing this issue for the longest!! I at least feel better knowing it’s just a bug with the whisper integration…
Something about it thinking it heard someone else saying a bunch of stuff I didn’t say was giving me the creeps… ("someone’s listening… …lol jk… mostly)
Adding ‘whisper’ to my searches yielded tonsss more results and I gotta say, I’m surprised at how many ppl are just sharing work arounds and don’t seem phased/perplexed whatsoever by this (IMO) very very odd behavior…
I feel like I can generally speculate at the technical causes of various bugs I notice in the wild (back end dev) but I can’t even begin to speculate at what would cause such a strange bug like this…
Here’s the one I got today:
“”"
Click now for my last video about ChatGPT-3. Click now for my last video about ChatGPT-3. Click now for my last video about ChatGPT-3. Click now for my last video about ChatGPT-3.
“”"
I’ve gotten this one a few times:
“Thank you for watching!”
…I’ve gotten them in other languages
But they all share a common trait which is they sound like they are the transcript to a video…
If anyone knows the reason this happens, I am beyond perplexed so deff @ me lol
It’s pretty much known at this point that OpenAI trained Whisper on YouTube videos, among other things (the legality of which is still up for debate). Regardless, this is why the model sometimes hallucinates these lines during silence.
In my uses of Whisper, I got the model to incorrectly output stuff like “Thanks for watching.” or even “Subtitles made by the community of Amara.org” when giving it a noisy input. As far as I know, there’s no easy way to reliably get rid of these during generation; all you can do is manually remove it afterwards or script it out using a keyword system.
I had the issue and it dissapeared (so far) by adding the language, even with a couple of second of no speech at the beginning of my file.
I’ve had the same ongoing experience with “thanks for watching”. it has occured in the android app (in speach to speach mode and in manual chat mode). “thankyou for watching” has even appeared in my chat box without my input obvi. i have also experienced in PC web based. Following the ‘whisper’ events, chatgpt has responded several different ways.
- Sorry, i didn’t catch that,
- You must be concluding a broadcast.
- Sorry, i didn’t mean to interrupt.
I’ve often asked for clarification of the event and no recollection can me made. When i have communicated the event occurrence as a GPT glitch, i was quickly corrected with a defensive response stating the glitch was internal solely on my end (user)