It is giving a translation of the transcription instead of just giving the transcription. (It is doing extra work that is not asked, not sure is “failing” is the word)
What setting do you use so that the following code would always return the transcription, and never attempt to translate from the voice accent:
const response = await openai.audio.transcriptions.create({
model: “whisper-1”,
file: audioFile,
});
const transcription = response.text;
You can use the language
parameter:
language
string
Optional
The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.
Then you can also influence the beginning of the transcription with a prompt
of the same language:
prompt
string
Optional
An optional text to guide the model’s style or continue a previous audio segment. The prompt should match the audio language.
That’s the extent of control
I do not know in advance what languages my users will speak.
The main issue remains that the system understands, but instead of returning the transcription, decides to return a translation of that transcription based on the accent of the voice.
Did you test? If you want to test, think Apu from the Simpsons, speaking in english with an accent, he would get a text back in hindi instead of in english… and the hindi text would be the translation of what he said in english.