Disable timestamps for Whisper?

Hi,

I have been searching all over the internet, including the official documentation of whisper, but I can’t find a way to disable timestamps on whisper transcripts. I’m using a colab.land project with the following line:

!whisper {input_path} --model large-v2 --language English --output_dir {output_folder} --output_format vtt

Can you help me on this? I’m not a developer myself, so I might have miss something. I have seen other hugging face projects where you can actually choose activate or deactivate timestamps for the output.

Thanks, Kind regards

Hello Rrrila,

I came to this forum seeking solution for the same issue. Found nothing yet, unfortunately. As far as I know, the Whisper library does not have a built-in option to disable timestamps for the transcripts. Well… you can always try manually removing the timestamps from the output file after running the command you mentioned. You can do this by opening the VTT file in a text editor and deleting the lines that contain the timestamp information.

Also, it would more reasonable to reachout to the developers of the Hugging Face library or the Whisper library directly seeking the solution (waiting for their response rn)

If I’ll be lucky enough to find the solution, I’ll make sure to post here

Hey!
Thanks for your answer, although, I know I can manually amend it (or using a script for it) that is not ideal solution, specially on languages such as Arabic, due to the fact that it repeats words many times on different time stamps, as translating to Arabic has too many complications, so removing timestamps will help on not to repeat words.

Hi!
My motive was also to disable timestamps, but in hopes to get less halucinations. But OAI Whisper does not provide a way to disable it during inference. HF Whisper provides disabling through return_timestamps param in generate method, though only for short-form (<30 secs) clips.
I believe the reason it cannot be disabled for >30 sec clips is because a segment’s decoding depends on the timestamp predicted from its previous segment. Check section 4.5 of Whisper paper:
image

Regards,
Jay