Whisper api produces transcription in korean on no speech

I have an audio recording that contains no human speech, it’s actually the audio from a video where a woman is cleaning her kitchen. Surprisingly, the OpenAI audio transcription API produces a hallucinated transcription in Korean.

I was expecting the Whisper API to produce an empty transcription for such audio files because I’m developing an application that anticipates audio with or without speech.

Looking for any suggestion to overcome the problem.

Hi and welcome to the Developer Forum,

You might search for some of the work Nvidia did with the RTX series cards on detecting and isolating speech, it’s actually a non trivial problem. AI’s will always try and find the best match given the input, unless that input is silence there is always a probability of a false detection.