I use whisper to generate subtitles, so to transcribe audio and it gives me the variables „start“, „end“ and „text“ (inbetween start and end) for every 5-10 words. Is it possible to get these values for every single word? Do I have to like, use a different whisper model or similair? I would use that data to generate faster changing subititles.
Would be grateful for any help!
Sincereley