Whisper - What would be the approach to transcribing multi-language audio?

I am relatively successfully using Whisper API to create transcriptions of audio to text, as well as creating subtitles for videos.

However, I am facing a problem in handling multi-language audio. It is understandable why it does not work well, but I wanted to ask if anyone managed to make it work, at least to a some degree? What strategies we could use to make it better?

If you know the language being spoken before hand you can pass that to the model and it will perform well. I have not built anything that is multilingual and is not being told what language beforehand.

Well, this could be good. Are you passing it in the prompt or as parameters? I guess the former.