Mainly I want to find out if Whisper can be used to measure/recognise things like correct pronunciation, intonation, articulation etc which are often lost in other speech to text services. From the onset and reading the documentation, it seems unlikely but I just wanted to ask here in case anyone has thought of or tried to do something similar.
I also hope that “mixed” language is picked up. Many places “mix” the words and they have their own pronunciation. I also realize the same language says words differently per culture - so it’s complex…Need all the training data for that ( few shot can help don’t have to be extensive even on phoneme structures )…