Whisper API for pronunciation, intonation, etc

I’m exploring the use of ASR

Mainly I want to find out if Whisper can be used to measure/recognise things like correct pronunciation, intonation, articulation etc which are often lost in other speech to text services. From the onset and reading the documentation, it seems unlikely but I just wanted to ask here in case anyone has thought of or tried to do something similar.


Whisper 2 does not have any kind of accent or pronunciation detection, it simply tries to guess what word was attempted to be said.

Whisper 3 may have some additional abilities but I have not seen any details on that as yet.

Thanks so much for the info! @Foxabilo :slight_smile:

