Ok, whisper-3 announcement was one of the biggest things for me, and surprising one as well. However, they were very brief in that, showing that it is not one of their focus products.
However, it is open source, already released on github - and I understand that API access will follow on in the next weeks (months)?
However, besides improvements in the language capabilities and making it more precise in transcription, are any of other things solved? For example, occasional hallucinations on music/silences, more precise timecodes for subtitles, speaker recognition, word-level timecodes etc. From what I can see on github not, so I am guessing OpenAI will depend on others to integrate alignment, diarization, or they are waiting to release it on API to offer these tools?
If anyone tried it, I would be interested to hear on thoughts.