Whisper language recognition

I speak German with an accent, and in my tests, Whisper recognizes my audio as German 2 out of 20 times.
I need a speech transcription tool that understands both English and German speech, and ideally can handle accents as well. Can Whisper handle this task, or can someone provide advice on this topic?

1 Like

Whisper API has two parameter mechanisms that will enhance its adherence:
language: (two letter code)
prompt: (language that leads up to where the audio to be transcribed begins)

If syllables that are coming out of your mouth are rooted in and colored by another language, Whisper has an uncanny ability for tuning in on that and making a decision early.

You can even prepend a few seconds of native speaker to the audio to align the transcription, something reliable enough to strip out afterwards by discarding the first sentence. If there was any hint of intelligence to be found in the AI, even something that says “our interview with a non-native speaker of the German language conducted in German now continues.” (“Unser Interview mit einer Person, die Deutsch nicht als Muttersprache spricht und das ausschließlich auf Deutsch geführt wird, geht nun weiter.”)


Hey @iliuha1993, try out my WiseTalk App, especially the Voice Translator role.
[wisetalkapp dot com]

Basically, it provides a voice interface to the OpenAI API. But, I use the embedded speech recognition engine of the iPhone/Android, which is still slightly better than Whisper, especially in recognizing accents.

Iz it like Zwarzenegger zpeaking Inglisch? :grinning:
I wonder if he would have the same problem…

Anyway we used it with Dutch - a bit similar to German - with accents and it worked perfectly. Note that we are native speakers, but do have regional accents.

1 Like