Hello everyone ,
I want to do speech to text with derealization with whisper api , till now i succeed to transcript the audio file with two sides to text but without separate .
the goal is to separate to agent and customer.
tnx for your help
Hello everyone ,
I want to do speech to text with derealization with whisper api , till now i succeed to transcript the audio file with two sides to text but without separate .
the goal is to separate to agent and customer.
tnx for your help
Yup, it’s quite a thing where the model doesn’t understand reality the way we do and then…
but then I decided to fix the autocorrect typo in the title, at least, and I suggest you search the forum. There are some solutions for this problem and the open source community has also contributed a lot, especially for the V2 model.
Hope this helps.
Whisper doesn’t do speaker diarization natively, you will have to use a separate model specifically for this purpose. Generally speaking you start by chunking the input based on who’s speaking and send those to whisper for transcription.