How to transcribe two-person interview with Whisper API?

I have successfully tested transcribing a video with the Whisper API (through Make, actually).

But it does not delineate respective speakers in the interview.

I triedissuing this prompt with the API request: “This is an interview. There is more than one speaker. Properly delineate interviewer and interviewee. Also use line breaks at appropriate points.” But it does nothing.

I’m exploring moving off Rev, which certainly does distinguish speakers within the video (Speaker 1, Speaker 2, etc).

I read that Whisper cannot yet distinguish speakers - is this correct?

Correct, the current iteration of whisper is unable to differentiate between speakers.

1 Like

This is true, but there are open source tools that use Whisper that can do this. This is called speaker diarization. This is the search term you should use when looking for this functionality.