Is there anyway to feed recordings or live audio with multiple people in mp3 or wav formats and have it transcribe the audio, Whisper can only do 1 person, I need 2 or more people done for phone calls and conference calls.
BTW now we use a transcription service and feed it 2 channels, one for each person.
in Hello GPT-4o, one of the examples described how they managed to assign speakers on the transcription. they used audio input which is not yet available for us.
so i tested the idea, got a transcription of the sample audio in that example using local whisper and used gpt-4o in chat completions to ask the AI to assign speakers in the transcription.
System Prompt:
You are a helpful transcription expert.
You will help in assigning speaker’s names in the transcription provided.
User(Input):
This is a transcription of a team meeting. In attendance are Mark, the project manager and 3 members of his team namely Derek, the marketing expert, Nick the UI designer and Sanjay the industrial designer.
[00:00 --> 00:03] Okay, good morning.
[00:03 --> 00:05] Here's the first good evening.
[00:05 --> 00:06] Morning.
[00:06 --> 00:07] Morning.
[00:07 --> 00:09] I'll be your project manager for today, this project.
[00:09 --> 00:10] My name is Mark Knopf.
[00:10 --> 00:11] I'll be giving this.
[00:11 --> 00:12] I'll tell you a few.
[00:12 --> 00:15] Kick the project off.
[00:15 --> 00:16] It's my...
[00:16 --> 00:17] Let's see a Jennifer today.
[00:17 --> 00:19] Well, of course, we're new to each other.
[00:19 --> 00:21] So then I'd get acquainted first.
[00:21 --> 00:22] So let's do that first.
[00:22 --> 00:24] I mean, let's start with you.
[00:24 --> 00:25] Can you introduce yourself?
[00:25 --> 00:26] You're a marketing expert.
[00:26 --> 00:27] Yes.
[00:27 --> 00:28] My name is Derek.
[00:28 --> 00:30] Derek, Derek Minecraft.
[00:30 --> 00:34] I will be a genius project, a marketing expert.
[00:34 --> 00:42] And I will see what the user wants and what we can do with the new project.
[00:42 --> 00:43] Project.
[00:43 --> 00:44] Excellent.
[00:44 --> 00:46] Nick de Boer.
[00:46 --> 00:48] User interface designer.
[00:48 --> 00:52] I'm going to look at the technical design from the user point of view.
[00:52 --> 00:54] Okay.
[00:54 --> 00:55] My name is Sanjay Jirkens.
[00:55 --> 01:00] I'm the industrial designer and there are three main questions that I have to find answer to.
Assistant(Output):
[00:00 --> 00:03] Mark: Okay, good morning.
[00:03 --> 00:05] Derek: Here's the first good evening.
[00:05 --> 00:06] Nick: Morning.
[00:06 --> 00:07] Sanjay: Morning.
[00:07 --> 00:09] Mark: I'll be your project manager for today, this project.
[00:09 --> 00:10] My name is Mark Knopf.
[00:10 --> 00:11] I'll be giving this.
[00:11 --> 00:12] I'll tell you a few.
[00:12 --> 00:15] Mark: Kick the project off.
[00:15 --> 00:16] Mark: Let's see a Jennifer today.
[00:17 --> 00:19] Mark: Well, of course, we're new to each other.
[00:19 --> 00:21] Mark: So then I'd get acquainted first.
[00:21 --> 00:22] Mark: So let's do that first.
[00:22 --> 00:24] Mark: I mean, let's start with you.
[00:24 --> 00:25] Mark: Can you introduce yourself?
[00:25 --> 00:26] Mark: You're a marketing expert.
[00:26 --> 00:27] Derek: Yes.
[00:27 --> 00:28] Derek: My name is Derek.
[00:28 --> 00:30] Derek: Derek Minecraft.
[00:30 --> 00:34] Derek: I will be a genius project, a marketing expert.
[00:34 --> 00:42] Derek: And I will see what the user wants and what we can do with the new project.
[00:42 --> 00:43] Mark: Excellent.
[00:44 --> 00:46] Nick: Nick de Boer.
[00:46 --> 00:48] Nick: User interface designer.
[00:48 --> 00:52] Nick: I'm going to look at the technical design from the user point of view.
[00:52 --> 00:54] Mark: Okay.
[00:54 --> 00:55] Sanjay: My name is Sanjay Jirkens.
[00:55 --> 01:00] Sanjay: I'm the industrial designer and there are three main questions that I have to find answer to.