Currently whisper isn’t able to identify different speakers like
Speaker 1: …
Speaker 2: …
Speaker 3: …
Is it possible to identify each speaker individually by their tone or something?Or, can we connect any other tool with whisper to identify different speakers
Hi, thanks. Can you please share some references on how to combine the two and use time stamps to sync.
We currently use Riverside.fm to record our podcast. It’s speaker recognition is good but transcription is not as accurate as whisper.
I think the term is “speaker diarization”, and I see some guides online using “pyannote.audio” together with Whisper to achieve this