Audio File Trtanscription

Greetings

Is there anyway to feed recordings or live audio with multiple people in mp3 or wav formats and have it transcribe the audio, Whisper can only do 1 person, I need 2 or more people done for phone calls and conference calls.

BTW now we use a transcription service and feed it 2 channels, one for each person.

Any ideas?

Thank you!

in Hello GPT-4o, one of the examples described how they managed to assign speakers on the transcription. they used audio input which is not yet available for us.

so i tested the idea, got a transcription of the sample audio in that example using local whisper and used gpt-4o in chat completions to ask the AI to assign speakers in the transcription.

System Prompt:

You are a helpful transcription expert.
You will help in assigning speaker’s names in the transcription provided.

User(Input):

This is a transcription of a team meeting. In attendance are Mark, the project manager and 3 members of his team namely Derek, the marketing expert, Nick the UI designer and Sanjay the industrial designer.

[00:00 --> 00:03]  Okay, good morning.
[00:03 --> 00:05]  Here's the first good evening.
[00:05 --> 00:06]  Morning.
[00:06 --> 00:07]  Morning.
[00:07 --> 00:09]  I'll be your project manager for today, this project.
[00:09 --> 00:10]  My name is Mark Knopf.
[00:10 --> 00:11]  I'll be giving this.
[00:11 --> 00:12]  I'll tell you a few.
[00:12 --> 00:15]  Kick the project off.
[00:15 --> 00:16]  It's my...
[00:16 --> 00:17]  Let's see a Jennifer today.
[00:17 --> 00:19]  Well, of course, we're new to each other.
[00:19 --> 00:21]  So then I'd get acquainted first.
[00:21 --> 00:22]  So let's do that first.
[00:22 --> 00:24]  I mean, let's start with you.
[00:24 --> 00:25]  Can you introduce yourself?
[00:25 --> 00:26]  You're a marketing expert.
[00:26 --> 00:27]  Yes.
[00:27 --> 00:28]  My name is Derek.
[00:28 --> 00:30]  Derek, Derek Minecraft.
[00:30 --> 00:34]  I will be a genius project, a marketing expert.
[00:34 --> 00:42]  And I will see what the user wants and what we can do with the new project.
[00:42 --> 00:43]  Project.
[00:43 --> 00:44]  Excellent.
[00:44 --> 00:46]  Nick de Boer.
[00:46 --> 00:48]  User interface designer.
[00:48 --> 00:52]  I'm going to look at the technical design from the user point of view.
[00:52 --> 00:54]  Okay.
[00:54 --> 00:55]  My name is Sanjay Jirkens.
[00:55 --> 01:00]  I'm the industrial designer and there are three main questions that I have to find answer to.

Assistant(Output):

[00:00 --> 00:03]  Mark: Okay, good morning.
[00:03 --> 00:05]  Derek: Here's the first good evening.
[00:05 --> 00:06]  Nick: Morning.
[00:06 --> 00:07]  Sanjay: Morning.
[00:07 --> 00:09]  Mark: I'll be your project manager for today, this project.
[00:09 --> 00:10]  My name is Mark Knopf.
[00:10 --> 00:11]  I'll be giving this.
[00:11 --> 00:12]  I'll tell you a few.
[00:12 --> 00:15]  Mark: Kick the project off.
[00:15 --> 00:16]  Mark: Let's see a Jennifer today.
[00:17 --> 00:19]  Mark: Well, of course, we're new to each other.
[00:19 --> 00:21]  Mark: So then I'd get acquainted first.
[00:21 --> 00:22]  Mark: So let's do that first.
[00:22 --> 00:24]  Mark: I mean, let's start with you.
[00:24 --> 00:25]  Mark: Can you introduce yourself?
[00:25 --> 00:26]  Mark: You're a marketing expert.
[00:26 --> 00:27]  Derek: Yes.
[00:27 --> 00:28]  Derek: My name is Derek.
[00:28 --> 00:30]  Derek: Derek Minecraft.
[00:30 --> 00:34]  Derek: I will be a genius project, a marketing expert.
[00:34 --> 00:42]  Derek: And I will see what the user wants and what we can do with the new project.
[00:42 --> 00:43]  Mark: Excellent.
[00:44 --> 00:46]  Nick: Nick de Boer.
[00:46 --> 00:48]  Nick: User interface designer.
[00:48 --> 00:52]  Nick: I'm going to look at the technical design from the user point of view.
[00:52 --> 00:54]  Mark: Okay.
[00:54 --> 00:55]  Sanjay: My name is Sanjay Jirkens.
[00:55 --> 01:00]  Sanjay: I'm the industrial designer and there are three main questions that I have to find answer to.
1 Like

We do that now with a transcription service.

Have they hinted with their audio model will be released that can interpret who is who and transcribe an audio file directly?