How to extract per-token logprobs + timestamps from Whisper?

youssef.avx · September 27, 2022, 8:43am

Hi! I noticed that in the output of Whisper, it gives you tokens as well as an ‘avg_logprobs’ for that sequence of tokens.

I’m struggling currently to get some code working that’ll extract per-token logprobs as well as per-token timestamps.

I’m curious if this is even possible (I think it might be) but I also don’t want to do it in a hacky way that might be incorrect. Would love any help whatsoever!

Would also be curious if it’s possible to do per-token timestamps.

I think a potential use case of this is measuring speech intelligibility prediction, a la: [2204.04288] Unsupervised Uncertainty Measures of Automatic Speech Recognition for Non-intrusive Speech Intelligibility Prediction

cliocjs · July 6, 2023, 10:16am

It would also be useful for subtitle and karaoke generation.

Words that are spoken need to leave the screen after they are spoken.

Currently, Whisper has timestamps hard-bumping into each other with no pauses in between.

It’s super awkward for both karaoke and subtitling.

Topic		Replies	Views
Can whisper give timestamps for every single word instead of every 5-10 words? API codex , whisper	3	3644	December 14, 2023
How can I get word_timestamp? API whisper	1	3494	December 14, 2023
Speech To Text words details API whisper	2	872	December 14, 2023
How to get Whisper's API to add timestamps to the transcripts? API api , whisper	5	20508	January 29, 2024
Whisper API: a) Timecodes; b) how good is open-source vs API? API whisper	9	6759	July 28, 2023

How to extract per-token logprobs + timestamps from Whisper?

Related topics