Wondering what the state of the art is for diarization using Whisper, or if OpenAI has revealed any plans for native implementations in the pipeline. I’ve found some that can run locally, but ideally I’d still be able to use the API for speed and convenience.
Google Cloud Speech-to-Text has built-in diarization, but I’d rather keep my tech stack all OpenAI if I can, and believe Whisper is better regardless.
Yes, this is a shortcoming. I have tried using Whisper in combination with Pyannote, but the result is a bit complicated to implement, plus the results are not even close to ideal.
This library picked up a bunch of steam, haven’t used yet but everything I’ve read and seen looks pretty amazing. Runs locally, doesn’t use API, but seems to be especially fast.