I’m using whisper-large-v3-turbo to transcribe voice inputs in both English and Arabic. However, I’m encountering an issue where the Arabic word “نعم” (which means “yes”) is consistently being transcribed incorrectly as “Naah” or “Naahe”.
Has anyone else experienced this behavior with Whisper? If so, what strategies or configurations have you found effective in improving transcription accuracy for short Arabic words like this?
Any insights or suggestions would be greatly appreciated.
1 Like
You can try a different model.
While whisper has less truncation problems, other models can perform better for specific languages.
https://openai.com/index/introducing-our-next-generation-audio-models/
Also, all models perform significantly worse than usual if the audio is too short with only a single word.
1 Like
My priority is to use a free model for transcription. Can you suggest any?
Nothing that comes to mind ATM, but I’ve heard some people do fine-tuning on whisper to improve performance.
1 Like
Thanks for flagging this. Can you please generate a HAR file as you experience this error and email those details to support@openai.com?
I had the same issues, but when I played with the parameters of my voice activity detection algorithm and the transcribed chunks sizes, it got fixed. One more important thing is the temperature, when you reduce it to around 0.1, this issue happens less often as I experienced.
Good Luck!