Anyone experience whisper hallucinating on empty sections? In my case, I’m dealing with audio/video in Indonesian, and usually when there’s an empty section at the beginning or the end, whisper will fill in something like “thanks for watching” or “sub by x”. is there anyway to prevent this? maybe with vad filter?
This post was flagged by the community and is temporarily hidden.
I had the same issue but with the regular text model. It tends to do that when it get’s the feeling that the provided content is not finished. What I have done is to specify this by appending a string variable in the API-Request where I stated that. Maybe you can do something similar with whisper?
I took a glance at the documentation and there it is stated that you can use an optional prompt to guide the way the model replies. There you can state the behavior you want to see.
Refer to the Link below (Documentation → Audio)
I think any sort of threshold would be a great way to filter out any noise caused by … no noise …