Troubleshooting OpenAI's Whisper Model: Resolving Incorrect Language Outputs for Maithili with Multilanguage Tokenizer

:microphone: Issues with Maithili Language Output Using Whisper Multilanguage Tokenizer

Hello everyone,

I am currently working on a project involving speech recognition for the Maithili language, which is not natively supported by existing models. I am using OpenAI’s Whisper multilanguage tokenizer for this purpose. However, I am encountering an issue where the output generated by the model is not in Maithili but rather in other languages.

Details:

Model: Whisper (multilanguage tokenizer)

Language: Maithili (a language not previously trained on)

Issue: The output printed by the model is not in Maithili, but in other languages.

Could anyone suggest possible reasons for this issue and potential solutions to ensure that the model generates accurate output in Maithili? Any insights or recommendations on how to address this problem would be greatly appreciated.

Thank you!

whisper openai #speechtotext #deeplearning #debugging

The tag you want to add is fine-tuning, as that is the only way this is going to work, for an unsupported language with fewer native speakers than the population of California. There is no secret parameter to make it start working, and no accepting ISO codes not in the pretrained set.

If the input results in a coherent translation to other language rather than imaginings, that would be thought provoking of what small amount of data there may already be in OpenAI’s whisper-2 being used. But little way to activate it except if you send 30 seconds where the first 10s were eloquently-spoken unmistakable distinguishable clear native language, along with lots of pre-prompt text that is written out in that language.

Fine-tune is a scale of a problem where you would want to enlist well funded resources interested in the overall project. The knowledge work required to produce data sets where there are none to be mined may be extensive. You must consider that hundreds or thousands of hours in 30 second snippets that are labeled data are required for the tuning to be of quality in a new-language case.

OpenAI API doesn’t support fine-tune of Whisper, but it is open-source.