Speech to text with diarization

rontwito4 · March 18, 2024, 11:12am

Hello everyone ,

I want to do speech to text with derealization with whisper api , till now i succeed to transcript the audio file with two sides to text but without separate .
the goal is to separate to agent and customer.
tnx for your help

vb · March 18, 2024, 12:31pm

Yup, it’s quite a thing where the model doesn’t understand reality the way we do and then…

but then I decided to fix the autocorrect typo in the title, at least, and I suggest you search the forum. There are some solutions for this problem and the open source community has also contributed a lot, especially for the V2 model.

Hope this helps.

N2U · March 18, 2024, 7:14pm

Whisper doesn’t do speaker diarization natively, you will have to use a separate model specifically for this purpose. Generally speaking you start by chunking the input based on who’s speaking and send those to whisper for transcription.

Topic		Replies	Views
How to transcribe two-person interview with Whisper API? API whisper	2	5574	December 21, 2023
How to identify different speakers using whisper? Community whisper	3	28150	November 2, 2023
Best solution for Whisper diarization/speaker labeling? API whisper	19	35791	December 18, 2024
I wish that when using Whisper I could separate the transcriptions into channels API api , whisper	2	888	August 6, 2024
Can I get speaker diarization using openai api on node.js? API gpt-4 , api	5	2107	April 22, 2024

Speech to text with diarization

Related topics