Whisper Translation failure

I did a test where I uploaded audio I recorded in Spanish, asked it to be translated (into English), and it just came back as transcribed. Here’s the verbose_json response:

{"task":"translate","language":"english","duration":6.57,"segments":[{"id":0,"seek":0,"start":0.0,"end":7.0,"text":" Hola! ¿Cómo estás? Eso es algo en español.","tokens":[50364,22637,0,3841,28342,24389,30,27795,785,8655,465,31177,13,50714],"temperature":0.5,"avg_logprob":-0.6253014246622721,"compression_ratio":0.84,"no_speech_prob":0.07229530811309814,"transient":false}],"text":"Hola! ¿Cómo estás? Eso es algo en español."}

As you can see, it correctly transcribed the audio in Spanish, but no translation occurred.

I attempted to set the language field to "es" to encourage it to assume the source is in Spanish, and it just output an error saying that the only legal value is "en". This seems counter to how language is used in transcriptions, where it is describing the source audio.

Any thoughts?

I saw the same thing. Just let it transcribe into words (in Spanish), and then take the words and transcribe to English.

Maybe there’s a better route, but with Whisper’s built-in language detection, this was the easiest.

Hah, right. I added “From Spanish” as the prompt text, and now it translates.

Right, but if you don’t know. Then your pipeline would look like “translate” → “native text” → “autodetect language” → “English text if not English”

Yeah, it would have to be something like that, I imagine.

1 Like