Thoughts on Whisper-3 announcement

Ok, whisper-3 announcement was one of the biggest things for me, and surprising one as well. However, they were very brief in that, showing that it is not one of their focus products.

However, it is open source, already released on github - and I understand that API access will follow on in the next weeks (months)?

However, besides improvements in the language capabilities and making it more precise in transcription, are any of other things solved? For example, occasional hallucinations on music/silences, more precise timecodes for subtitles, speaker recognition, word-level timecodes etc. From what I can see on github not, so I am guessing OpenAI will depend on others to integrate alignment, diarization, or they are waiting to release it on API to offer these tools?

If anyone tried it, I would be interested to hear on thoughts.


For real world call transcription diarization is a must IMHO. Hope this is on a near term roadmap.

1 Like

There are a number of python packages that use OS whisper with different techniques of vocal isolation in order to improve hallucinations and add in those timecodes. We have one hosted on replicate I can share if it helps.


Please share this information, I am looking forward about it. Appreciate your help.

Please share some potential solutions

This is the best one:
Look up “stable-ts whisper”

1 Like