Transcription multilingual audio

Good morning.
I’m using API to transcript audios with 2 options: whisper-1 (model) and response_format in srt or verbose-json (same problem).
Sometime the audio contain question in english and answer in italian.
The text returned is always in italian.
There is a way to mantain multilingual?

Hi and welcome to the Developer Forum!

You might try prompting the model with an example where the question is in English and the answer in Italian, I’ve not tried it but it does follow examples, at least linguistically, not sure about language wise.

Thanks for your answer.
What do you mean by “prompting the model”?

You can give a prompt to Whisper like it was a base GPT model. Or, you can postprocess the result via GPT-4 or GPT-3.5.
Whisper Docs

1 Like

The Whisper API has a prompting parameter where you are able to steer the output it produces, documentation is here Whisper Transcription Prompt

An optional text to guide the model’s style or continue a previous audio segment. The prompt should match the audio language.

I’ve tried using “prompt = it” (or en) and “prompt = Italian”.
Same results.
Maybe it’s a openai-php client problem…?

Actually, the idea is to set the prompt to an example of a text in Italian. For example: “Ciao buon giorno. Come va oggi?”

By the way, could you pass the code you’re using to check if it’s an issue with it?

Off course

    /* transcribe file*/
    try { $response = $this->client->audio()->transcribe([
      'model' => 'whisper-1',
      'file' => fopen($file, 'r'),
      'response_format' => 'verbose_json',
      'prompt' => 'en', // or it? or italian?
    ]); } 

The code seems ok. Did you try setting the prompt to an example of a text in Italian?

I’ve created a mp3 with some audio in italian and some audio in english and I’ve make some tests using “prompt = it” and “prompt = en”.
The results is always a text in italian, maybe because the first part is in italian?

I’m saying that just “en” or just “it” not enough, show it an example sentence that starts with a question in English and ends with an answer in Italian.

but how? using curl?
something like this?

curl \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F file="@/path/to/file/multilang.mp3" \
  -F model="whisper-1"
  -F prompt="en"

No, the problem is not the client, we mean you set prompt to a phrase in both English and Italian.

try { $response = $this->client->audio()->transcribe([
      'model' => 'whisper-1',
      'file' => fopen($file, 'r'),
      'response_format' => 'verbose_json',
      'prompt' => 'Hello, how are you? Ciao, bene, grazie per avermelo chiesto.',
    ]); }

I don’t understand how this API request can solve the problem if I need to transcribe 21.000 hours of audio in batch mode.

Maybe there is some confusion about what I’m saying (my english is terrible, sorry).
With an example maybe I can show you what appen.

From #1 to #221, the people speak in Italian.

00:17:07,800 → 00:17:15,800
vengono usate in una delle
scopole che ha esempio di

00:17:15,800 → 00:17:21,800
scambio una parola drink
su una parola work e dice

#222, she makes a quote in English.

00:17:21,800 → 00:17:30,800
drink work is the curse of
which you start so working

Then continue in italian.

00:17:30,800 → 00:17:34,800
is the reduction of the
parts that drink instead of

00:17:34,800 → 00:17:38,800
drinking the drink is
the reduction of the

but the transcription is translated in english.
This is the problem.
The AI decided to switch language for some minutes.

00:24:35,640 → 00:24:40,640
developed far this notion

then switch to italian again.

00:25:11,640 → 00:25:17,640
alla alla sua crescita
anche psicologica

This is a real disaster.

This result making a request with

    try { $response = $this->client->audio()->transcribe([
      'model' => 'whisper-1',
      'file' => fopen($file, 'r'),
      'response_format' => 'verbose_json',
    ]); } 

If I try to force italian language using

    try { $response = $this->client->audio()->transcribe([
      'model' => 'whisper-1',
      'file' => fopen($file, 'r'),
      'response_format' => 'verbose_json',
      'language' => 'it',
    ]); } 

The API return this error

The server had an error processing your request. Sorry about that! You can retry your request, or contact us through our help center at if you keep seeing this error. (Please include the request ID 7408da853eded56aaeef56a5909689d7 in your email.)

I’ve tried multiple time and always return this error.
I ask to OpenAI to get more info about this errors (I sent all IDs) but I received same message “send us a screenshot” or “we are soo soooo sad for this” bla bla.

I waited a few minutes and tried again and it no longer gives me that error, either on that file or on all the other files that were skipped with that kind of error.
Maybe there were problems on the server.

The tanscription is in Italian, so (maybe) specifying “language = ‘it’” solves the problem.
I’ll try to re-transcript all SRT with language problems and I’ll see.

From my experience, transcribing multi-lingual content is still a problem. Sometimes it will just skip the other language, other times it will translate the first one, then next time I am getting repeated sentences/words, not even related to the language.

Then again, sometimes it works surprisingly well. Not sure how to interpret that, but I have gave up on trying to make it work better. We will see if whisper-3 will handle it better.

1 Like

50% of transcripts have this problem