I use the API and never have ran into issues either. Even with Spanish. I have been running it on my phone using Silero VAD and only experience hallucinations when maybe a single word or two is accidentally caught.
Strange. Whisper actively tries to prevent this exact issue using Beam Search and by using a dynamic temperature setting (if you have set it to 0). Whisper has a ~13% error rate with Croation.
So, three questions:
- Are you using a prompt to prime the transcription process?
- What is your temperature setting?
- How are you starting the audio? You said it’s live. How are you capturing the audio?