Whisper transcription translates to random language (Malay)

We’re encountering a very odd problem: A whisper transcription (English speech) is translated (accurately) to Malay. This is happening sporadically and very hard to reproduce, however, we’ve had multiple users flag this problem (non of the users actually speak Malay).

const transcription = await openai.audio.transcriptions.create({
		file: fs.createReadStream(path), //English input file
		model: "whisper-1",
	});

const transcript = transcription.text; // Malay transcript

The prompt parameter for Whisper doesn’t really accept instructions but is more for style (e.g. punctuation).

Has anyone experienced something similar with transcripts and/or any idea how we can force the transcript to be the same language as the input file?

Anything you can see similar in the files of the ones that translate to Malay? I’m thinking maybe it’s a word or phrase in the beginning that makes it think it’s Malay?

Or could be a bug. :wink:

You might try adding it to the system prompt…

3 Likes

Thanks Paul. Yes, some of the input files include food items that could potentially be attributed to Malay (i.e. Chapati, which is of course Indian but also very common/popular in Malaysia). However, that’s just one word in an entire sentence and not perfectly matched/attributed to Malay/Malaysia

You might try adding it to the system prompt

Can you please elaborate on this? The input language is unknown - users can speak their mother tongue - and the transcript must always return in the same language as the input file. The documentation states

The current prompting system is much more limited than our other language models and only provides limited control over the generated audio,

1 Like

Yeah, the tech is “bleeding edge,” so it breaks often currently. It’ll get better as time goes on.

Maybe something as simple as “Be sure to transcribe in the dominant language…” ??? … or something similar …

Ah, okay. I’ve not done a lot of transcribing personally, just trying to offer ideas. Could just be a weird bug where those food items tip the scale too much. If you can reproduce it, OpenAI would likely want to know about it if they don’t already (help.openai.com)…

Following to see if anyone else chimes in…

1 Like

You’ve already mentioned the limitations of prompting for Whisper, but it can still be used to your advantage. One important thing to note is that a whisper prompt is not an instruction comparable to a system prompt for a chat model. In case you haven’t tried it, here is my first recommendation:

Instead of prompting

Please transcribe the following text into English

we would use

The following is an audio transcript in English:

Apart from this, we do receive similar reports from time to time. For example, English-speaking users reporting transcriptions in Welsh.

A more sophisticated method could be to have another tool evaluate if the output is in the desired target language and retry if necessary. Note that for this deterministic task, an LLM would not be needed; you can use existing libraries for your programming language.

3 Likes

Hi @georg-san

If the language is known, use language param, which is an optional parameter that can be used to increase accuracy when requesting a transcription. It should be in the ISO-639-1 format.

4 Likes

+1 to this.

If you don’t select a language there is a separate module classifier that guesses the language.

3 Likes

@sps The language is not known unfortunately.

I guess the only way to guarantee that the correct language is being subscribed is to determine the language first and then either add it to the prompt (@vb ) or use the language param.

1 Like

Might be a case of a smaller / faster model being useful? With the lower costs, you might give it a few examples (your most common languages?) then ask it to classify the first X words of the prompt? Would only have to do it once and could likely keep latency down (or not too crazy…)

1 Like