I have some success fighting this issue just processing the file through ffmpeg with a silenceremove
command before sending the file to Whisper. Something like this: ffmpeg --fflags +discardcorrupt -y -i <file_name> -ar 8000 -af silenceremove=start_periods=1:stop_periods=-1:start_threshold=-30dB:stop_threshold=-30dB:start_silence=2:stop_silence=2
. You would probably change the -ar
(the sample rate) and some silenceremove
flags depending on your audio, for that you can refer to this page.
2 Likes