Speech to text with diarization

Hello everyone ,

I want to do speech to text with derealization with whisper api , till now i succeed to transcript the audio file with two sides to text but without separate .
the goal is to separate to agent and customer.
tnx for your help

1 Like

Yup, it’s quite a thing where the model doesn’t understand reality the way we do and then…

but then I decided to fix the autocorrect typo in the title, at least, and I suggest you search the forum. There are some solutions for this problem and the open source community has also contributed a lot, especially for the V2 model.

Hope this helps.

1 Like

Whisper doesn’t do speaker diarization natively, you will have to use a separate model specifically for this purpose. Generally speaking you start by chunking the input based on who’s speaking and send those to whisper for transcription.

1 Like