Can whisper give timestamps for every single word instead of every 5-10 words?

I use whisper to generate subtitles, so to transcribe audio and it gives me the variables „start“, „end“ and „text“ (inbetween start and end) for every 5-10 words. Is it possible to get these values for every single word? Do I have to like, use a different whisper model or similair? I would use that data to generate faster changing subititles.

Would be grateful for any help!



Hello, I’m trying to get timestamp for 5-10 words. I’m not able to do it. Can you please let me know how did you get timestamps from open ai , for 5-10 words.

1 Like

You can use the following library to accomplish this:

from pypi:


1 Like