Whisper - What would be the approach to transcribing multi-language audio?

I am relatively successfully using Whisper API to create transcriptions of audio to text, as well as creating subtitles for videos.

However, I am facing a problem in handling multi-language audio. It is understandable why it does not work well, but I wanted to ask if anyone managed to make it work, at least to a some degree? What strategies we could use to make it better?

1 Like

If you know the language being spoken before hand you can pass that to the model and it will perform well. I have not built anything that is multilingual and is not being told what language beforehand.

Well, this could be good. Are you passing it in the prompt or as parameters? I guess the former.