Hello everyone ,
I want to do speech to text with derealization with whisper api , till now i succeed to transcript the audio file with two sides to text but without separate .
the goal is to separate to agent and customer.
tnx for your help
Hello everyone ,
I want to do speech to text with derealization with whisper api , till now i succeed to transcript the audio file with two sides to text but without separate .
the goal is to separate to agent and customer.
tnx for your help
Yup, it’s quite a thing where the model doesn’t understand reality the way we do and then…
but then I decided to fix the autocorrect typo in the title, at least, and I suggest you search the forum. There are some solutions for this problem and the open source community has also contributed a lot, especially for the V2 model.
Hope this helps.
Whisper doesn’t do speaker diarization natively, you will have to use a separate model specifically for this purpose. Generally speaking you start by chunking the input based on who’s speaking and send those to whisper for transcription.
Any news on this topic. It has been more than an year. People having shared reference of WhisperX project but it has quite a lot of dependencies