I have successfully tested transcribing a video with the Whisper API (through Make, actually).
But it does not delineate respective speakers in the interview.
I triedissuing this prompt with the API request: “This is an interview. There is more than one speaker. Properly delineate interviewer and interviewee. Also use line breaks at appropriate points.” But it does nothing.
I’m exploring moving off Rev, which certainly does distinguish speakers within the video (Speaker 1, Speaker 2, etc).
I read that Whisper cannot yet distinguish speakers - is this correct?