[Whisper] Is there a way to tell the language before recognition?

content · February 22, 2023, 7:50am

Hi,
Is it possible to send to the model a short sound file and it returns the language spoken the majority of the time?
From my testing, if I know what lang is spoken and pass in the value of the ‘lang’ for the whole conversation (say, between 2 people), then the result will be better.

sps · February 22, 2023, 9:22am

Hi @content

Which model are you referring to?

content · February 22, 2023, 11:53am

Hi, we are using the medium model of Whisper, from the github page:

sps · February 22, 2023, 12:28pm


import whisper

model = whisper.load_model("base")

# load audio and pad/trim it to fit 30 seconds
audio = whisper.load_audio("audio.mp3")
audio = whisper.pad_or_trim(audio)

# make log-Mel spectrogram and move to the same device as the model
mel = whisper.log_mel_spectrogram(audio).to(model.device)

# detect the spoken language
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")

# decode the audio
options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)

# print the recognized text
print(result.text)

I haven’t used Whisper yet but if we look at the code, ideally it should be possible to take a tiny yet sufficient snippet of the audio, and then using the code above. The max(probs, key=probs.get) should return the most likely language and then run the entire audio file with the recognized language using:
whisper audio.flac audio.mp3 audio.wav --model medium
and specifying the detected language with --language option.

The only caveat will be that if the audio has multiple languages, the sample snippet will not be sufficient to determine the most prominent language, as the language determined will only be prominent in the snippet and not in the audio necessarily.

I’ll run this experiment and share my findings later.

content · February 23, 2023, 2:13pm

Thanks @sps, it does help. (Hmm, I have to type in random words to match the length requirement)

Topic		Replies	Views
Whisper transcription translates to random language (Malay) API whisper	8	1197	July 16, 2024
Whisper. Detect language of the audio API whisper	2	8781	December 17, 2023
Whisper - What would be the approach to transcribing multi-language audio? API whisper	3	3407	December 17, 2023
Whisper language recognition Documentation whisper	5	5743	September 4, 2024
Whisper API for pronunciation, intonation, etc API gpt-4 , whisper	3	3394	February 25, 2024

[Whisper] Is there a way to tell the language before recognition?

Related topics