Whisper silent audio hallucination

Hi all,

We’re using OpenAI audio transcription with whisper-1 model as backend model. In general it has been working as expected but recently we are facing an issue related to empty (but noisy) audios.

The videos (or audios) are known sample videos for testing purposes (like big buck bunny video). The problem is the result seems very consistent, because most of noisy audios return:

Spanish → Subtítulos realizados por la comunidad de Amara org
French → Sous-titres réalisés para la communauté d’Amara org
English → Support me on PATREON!

We tested with sample videos and some personal audios where we keep in silence (with environment noise) and we put some random images on video and we always reproduce same results in Spanish. English result depends on audio, sometimes empty, sometimes Amara org related audio and sometimes “Support me on PATREON!” audio.

How can we avoid this issue? it’s a known issue?

Regards!