Can Whisper “find” different people voices in the audio and separate them. Like if you ask transcript for an interview where there is two or more people speaking. How would that be done, or requested?
You can try WhisperX or if your audio is dual channels you can transcribe each channel separately.