Whisper API for pronunciation, intonation, etc

I’m exploring the use of ASR

Mainly I want to find out if Whisper can be used to measure/recognise things like correct pronunciation, intonation, articulation etc which are often lost in other speech to text services. From the onset and reading the documentation, it seems unlikely but I just wanted to ask here in case anyone has thought of or tried to do something similar.


1 Like

Whisper 2 does not have any kind of accent or pronunciation detection, it simply tries to guess what word was attempted to be said.

Whisper 3 may have some additional abilities but I have not seen any details on that as yet.

1 Like

Thanks so much for the info! @Foxalabs :slight_smile:

1 Like

I also hope that “mixed” language is picked up. Many places “mix” the words and they have their own pronunciation. I also realize the same language says words differently per culture - so it’s complex…Need all the training data for that ( few shot can help don’t have to be extensive even on phoneme structures )…