We are using Whisper API at OpenAI at the moment for our transcription tool for media companies, and it is working quite good.
However, I would like some advanced features, which are not available with Whisper at the moment - speaker diarization, word-level time stamping. Azure has started offering Whisper model since 15-09., but the prices are 3 times higher! OpenAI’s 1 hour of transcription costs 0,36 USD, while Microsoft will charge 1,00 USD for the same.
I guess I am staying with OpenAI’s API, so the question here is - can we hope for speaker diarization and word-level stamping soon?