I’m exploring the use of ASR
Mainly I want to find out if Whisper can be used to measure/recognise things like correct pronunciation, intonation, articulation etc which are often lost in other speech to text services. From the onset and reading the documentation, it seems unlikely but I just wanted to ask here in case anyone has thought of or tried to do something similar.
Thanks!