How to extract per-token logprobs + timestamps from Whisper?

Hi! I noticed that in the output of Whisper, it gives you tokens as well as an ‘avg_logprobs’ for that sequence of tokens.

I’m struggling currently to get some code working that’ll extract per-token logprobs as well as per-token timestamps.

I’m curious if this is even possible (I think it might be) but I also don’t want to do it in a hacky way that might be incorrect. Would love any help whatsoever!

Would also be curious if it’s possible to do per-token timestamps.

I think a potential use case of this is measuring speech intelligibility prediction, a la: [2204.04288] Unsupervised Uncertainty Measures of Automatic Speech Recognition for Non-intrusive Speech Intelligibility Prediction

2 Likes