Transcription multilingual audio

sphawk · October 27, 2023, 4:05pm

Good morning.
I’m using API to transcript audios with 2 options: whisper-1 (model) and response_format in srt or verbose-json (same problem).
Sometime the audio contain question in english and answer in italian.
The text returned is always in italian.
There is a way to mantain multilingual?

Foxalabs · October 27, 2023, 5:36pm

Hi and welcome to the Developer Forum!

You might try prompting the model with an example where the question is in English and the answer in Italian, I’ve not tried it but it does follow examples, at least linguistically, not sure about language wise.

sphawk · October 28, 2023, 7:21am

Thanks for your answer.
What do you mean by “prompting the model”?

NotFenixio · October 28, 2023, 8:51am

You can give a prompt to Whisper like it was a base GPT model. Or, you can postprocess the result via GPT-4 or GPT-3.5.
Whisper Docs

Foxalabs · October 28, 2023, 8:55am

The Whisper API has a prompting parameter where you are able to steer the output it produces, documentation is here Whisper Transcription Prompt

sphawk · October 28, 2023, 9:16am

An optional text to guide the model’s style or continue a previous audio segment. The prompt should match the audio language.

I’ve tried using “prompt = it” (or en) and “prompt = Italian”.
Same results.
Maybe it’s a openai-php client problem…?

NotFenixio · October 28, 2023, 9:23am

Actually, the idea is to set the prompt to an example of a text in Italian. For example: “Ciao buon giorno. Come va oggi?”

By the way, could you pass the code you’re using to check if it’s an issue with it?

sphawk · October 28, 2023, 9:41am

Off course

    /* transcribe file*/
    try { $response = $this->client->audio()->transcribe([
      'model' => 'whisper-1',
      'file' => fopen($file, 'r'),
      'response_format' => 'verbose_json',
      'prompt' => 'en', // or it? or italian?
    ]); } 
    ....

NotFenixio · October 28, 2023, 9:43am

The code seems ok. Did you try setting the prompt to an example of a text in Italian?

sphawk · October 28, 2023, 10:03am

I’ve created a mp3 with some audio in italian and some audio in english and I’ve make some tests using “prompt = it” and “prompt = en”.
The results is always a text in italian, maybe because the first part is in italian?

Foxalabs · October 28, 2023, 10:04am

I’m saying that just “en” or just “it” not enough, show it an example sentence that starts with a question in English and ends with an answer in Italian.

sphawk · October 28, 2023, 10:08am

but how? using curl?
something like this?

curl https://api.openai.com/v1/audio/translations \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F file="@/path/to/file/multilang.mp3" \
  -F model="whisper-1"
  -F prompt="en"

NotFenixio · November 5, 2023, 8:46pm

No, the problem is not the client, we mean you set prompt to a phrase in both English and Italian.

try { $response = $this->client->audio()->transcribe([
      'model' => 'whisper-1',
      'file' => fopen($file, 'r'),
      'response_format' => 'verbose_json',
      'prompt' => 'Hello, how are you? Ciao, bene, grazie per avermelo chiesto.',
    ]); }

sphawk · November 6, 2023, 11:16am

I don’t understand how this API request can solve the problem if I need to transcribe 21.000 hours of audio in batch mode.

Maybe there is some confusion about what I’m saying (my english is terrible, sorry).
With an example maybe I can show you what appen.

From #1 to #221, the people speak in Italian.

…

220
00:17:07,800 → 00:17:15,800
vengono usate in una delle
scopole che ha esempio di

221
00:17:15,800 → 00:17:21,800
scambio una parola drink
su una parola work e dice

#222, she makes a quote in English.

222
00:17:21,800 → 00:17:30,800
drink work is the curse of
which you start so working

Then continue in italian.

223
00:17:30,800 → 00:17:34,800
is the reduction of the
parts that drink instead of

224
00:17:34,800 → 00:17:38,800
drinking the drink is
the reduction of the

but the transcription is translated in english.
This is the problem.
The AI decided to switch language for some minutes.

274
00:24:35,640 → 00:24:40,640
developed far this notion

then switch to italian again.

275
00:25:11,640 → 00:25:17,640
alla alla sua crescita
anche psicologica

This is a real disaster.

This result making a request with

    try { $response = $this->client->audio()->transcribe([
      'model' => 'whisper-1',
      'file' => fopen($file, 'r'),
      'response_format' => 'verbose_json',
    ]); }

If I try to force italian language using

    try { $response = $this->client->audio()->transcribe([
      'model' => 'whisper-1',
      'file' => fopen($file, 'r'),
      'response_format' => 'verbose_json',
      'language' => 'it',
    ]); }

The API return this error

The server had an error processing your request. Sorry about that! You can retry your request, or contact us through our help center at help.openai.com if you keep seeing this error. (Please include the request ID 7408da853eded56aaeef56a5909689d7 in your email.)

I’ve tried multiple time and always return this error.
I ask to OpenAI to get more info about this errors (I sent all IDs) but I received same message “send us a screenshot” or “we are soo soooo sad for this” bla bla.

sphawk · November 6, 2023, 11:38am

I waited a few minutes and tried again and it no longer gives me that error, either on that file or on all the other files that were skipped with that kind of error.
Maybe there were problems on the server.

The tanscription is in Italian, so (maybe) specifying “language = ‘it’” solves the problem.
I’ll try to re-transcript all SRT with language problems and I’ll see.

nikola1jankovic · November 6, 2023, 8:37pm

From my experience, transcribing multi-lingual content is still a problem. Sometimes it will just skip the other language, other times it will translate the first one, then next time I am getting repeated sentences/words, not even related to the language.

Then again, sometimes it works surprisingly well. Not sure how to interpret that, but I have gave up on trying to make it work better. We will see if whisper-3 will handle it better.

sphawk · November 7, 2023, 7:21am

50% of transcripts have this problem

Topic		Replies	Views
Whisper-1 joint translation and transcription API	6	3152	October 21, 2024
Whisper is translating my audios for some reason API whisper	22	10655	December 17, 2024
Whisper transcription translates to random language (Malay) API whisper	8	999	July 16, 2024
Whisper API skipping on parts of transcriptions API whisper	13	7388	December 27, 2024
RealTime API Transcription errors Bugs realtime	7	1378	January 9, 2025

Transcription multilingual audio

Related topics