How to transcribe two-person interview with Whisper API?

I have successfully tested transcribing a video with the Whisper API (through Make, actually).

But it does not delineate respective speakers in the interview.

I triedissuing this prompt with the API request: “This is an interview. There is more than one speaker. Properly delineate interviewer and interviewee. Also use line breaks at appropriate points.” But it does nothing.

I’m exploring moving off Rev, which certainly does distinguish speakers within the video (Speaker 1, Speaker 2, etc).

I read that Whisper cannot yet distinguish speakers - is this correct?

Correct, the current iteration of whisper is unable to differentiate between speakers.

1 Like