Hi! I noticed that in the output of Whisper, it gives you tokens as well as an ‘avg_logprobs’ for that sequence of tokens.
I’m struggling currently to get some code working that’ll extract per-token logprobs as well as per-token timestamps.
I’m curious if this is even possible (I think it might be) but I also don’t want to do it in a hacky way that might be incorrect. Would love any help whatsoever!
Would also be curious if it’s possible to do per-token timestamps.
I think a potential use case of this is measuring speech intelligibility prediction, a la: [2204.04288] Unsupervised Uncertainty Measures of Automatic Speech Recognition for Non-intrusive Speech Intelligibility Prediction