I have an audio recording that contains no human speech, it’s actually the audio from a video where a woman is cleaning her kitchen. Surprisingly, the OpenAI audio transcription API produces a hallucinated transcription in Korean.
I was expecting the Whisper API to produce an empty transcription for such audio files because I’m developing an application that anticipates audio with or without speech.
Looking for any suggestion to overcome the problem.