Whisper transcription translates to random language (Malay)

georg-san · July 12, 2024, 4:14am

We’re encountering a very odd problem: A whisper transcription (English speech) is translated (accurately) to Malay. This is happening sporadically and very hard to reproduce, however, we’ve had multiple users flag this problem (non of the users actually speak Malay).

const transcription = await openai.audio.transcriptions.create({
		file: fs.createReadStream(path), //English input file
		model: "whisper-1",
	});

const transcript = transcription.text; // Malay transcript

The prompt parameter for Whisper doesn’t really accept instructions but is more for style (e.g. punctuation).

Has anyone experienced something similar with transcripts and/or any idea how we can force the transcript to be the same language as the input file?

PaulBellow · July 12, 2024, 4:16am

Anything you can see similar in the files of the ones that translate to Malay? I’m thinking maybe it’s a word or phrase in the beginning that makes it think it’s Malay?

Or could be a bug.

You might try adding it to the system prompt…

georg-san · July 12, 2024, 4:34am

Thanks Paul. Yes, some of the input files include food items that could potentially be attributed to Malay (i.e. Chapati, which is of course Indian but also very common/popular in Malaysia). However, that’s just one word in an entire sentence and not perfectly matched/attributed to Malay/Malaysia

You might try adding it to the system prompt

Can you please elaborate on this? The input language is unknown - users can speak their mother tongue - and the transcript must always return in the same language as the input file. The documentation states

The current prompting system is much more limited than our other language models and only provides limited control over the generated audio,

PaulBellow · July 12, 2024, 6:18am

Yeah, the tech is “bleeding edge,” so it breaks often currently. It’ll get better as time goes on.

Maybe something as simple as “Be sure to transcribe in the dominant language…” ??? … or something similar …

Ah, okay. I’ve not done a lot of transcribing personally, just trying to offer ideas. Could just be a weird bug where those food items tip the scale too much. If you can reproduce it, OpenAI would likely want to know about it if they don’t already (help.openai.com)…

Following to see if anyone else chimes in…

vb · July 12, 2024, 7:57am

You’ve already mentioned the limitations of prompting for Whisper, but it can still be used to your advantage. One important thing to note is that a whisper prompt is not an instruction comparable to a system prompt for a chat model. In case you haven’t tried it, here is my first recommendation:

Instead of prompting

Please transcribe the following text into English

we would use

The following is an audio transcript in English:

Apart from this, we do receive similar reports from time to time. For example, English-speaking users reporting transcriptions in Welsh.

A more sophisticated method could be to have another tool evaluate if the output is in the desired target language and retry if necessary. Note that for this deterministic task, an LLM would not be needed; you can use existing libraries for your programming language.

sps · July 12, 2024, 1:29pm

Hi @georg-san

If the language is known, use language param, which is an optional parameter that can be used to increase accuracy when requesting a transcription. It should be in the ISO-639-1 format.

anon10827405 · July 12, 2024, 2:34pm

+1 to this.

If you don’t select a language there is a separate module classifier that guesses the language.

georg-san · July 16, 2024, 6:27am

@sps The language is not known unfortunately.

I guess the only way to guarantee that the correct language is being subscribed is to determine the language first and then either add it to the prompt (@vb ) or use the language param.

PaulBellow · July 16, 2024, 8:00am

Might be a case of a smaller / faster model being useful? With the lower costs, you might give it a few examples (your most common languages?) then ask it to classify the first X words of the prompt? Would only have to do it once and could likely keep latency down (or not too crazy…)

Topic		Replies	Views
Whisper is translating my audios for some reason API whisper	24	11898	June 18, 2025
[Whisper] Is there a way to tell the language before recognition? API whisper	5	5940	December 17, 2023
Whisper-1 joint translation and transcription API	6	3383	October 21, 2024
Unexpected Welsh Language Output from English Audio Inputs? API whisper	2	1602	December 16, 2023
Whisper api produces transcription in korean on no speech API whisper	3	1315	November 24, 2024

Whisper transcription translates to random language (Malay)

Related topics