Using Whisper AI, it doesn’t transcribe the first approximately 10 minutes of the audio file I provide as input (italian language)
I think a little more information is needed for someone to be able to understand and help with the issue you are facing.
I work with Whisper API a lot. If you are able to share the audio file and show the parameters you are using in the API request, I can see if I have the same issue on my end or not.
Hi, These are the parameters I use:
model: "large-v3",
translate: false,
temperature: 0,
transcription: "plain text",
suppress_tokens: "-1",
logprob_threshold: -1,
no_speech_threshold: 0.6,
condition_on_previous_text: true,
compression_ratio_threshold: 2.4,
temperature_increment_on_fallback: 0.2
As for the Audio file (mp3) I use as input, it’s quite long (around an hour), and even though the voice isn’t very clear in the first few minutes, the transcription starts after about 10 minutes.
Is there a parameter setting that can solve this issue?
Thanks
Sorry, I totally missed notifications from this forum. I’m sure you have probably long moved on from this issue.
But for what it’s worth, I would have first tried to cut the first 10 minutes (the part that didn’t transcribe) and tried to send that to the API by itself, just to see what would happen.
I’ve had issues with parts of the audio not being transcribed. This happens whenever the audio starts with an “aah” or “ooh” sound, or any other vocalization/exclamation that is not a word. The transcription then may skip a long portion of the audio after this.
The same happens also if such a non-word sound follows after a long pause, though not as often.
In any case, when dealing with an hour of audio, I would probably add some code to automatically cut it up and send it as segments, and then resend any segment there was an error.
Something else (even though I haven’t tried this myself) you can do is modify the speed of the audio to slow it down, before you send it to the API. You’d do this automatically with ffmpeg or something. I’ve seen some anecdotes stating that this increases accuracy