Audio File Trtanscription

ChipG · May 29, 2024, 2:07am

Greetings

Is there anyway to feed recordings or live audio with multiple people in mp3 or wav formats and have it transcribe the audio, Whisper can only do 1 person, I need 2 or more people done for phone calls and conference calls.

BTW now we use a transcription service and feed it 2 channels, one for each person.

Any ideas?

Thank you!

supershaneski · May 29, 2024, 2:54am

in Hello GPT-4o, one of the examples described how they managed to assign speakers on the transcription. they used audio input which is not yet available for us.

so i tested the idea, got a transcription of the sample audio in that example using local whisper and used gpt-4o in chat completions to ask the AI to assign speakers in the transcription.

System Prompt:

You are a helpful transcription expert.
You will help in assigning speaker’s names in the transcription provided.

User(Input):

This is a transcription of a team meeting. In attendance are Mark, the project manager and 3 members of his team namely Derek, the marketing expert, Nick the UI designer and Sanjay the industrial designer.

[00:00 --> 00:03]  Okay, good morning.
[00:03 --> 00:05]  Here's the first good evening.
[00:05 --> 00:06]  Morning.
[00:06 --> 00:07]  Morning.
[00:07 --> 00:09]  I'll be your project manager for today, this project.
[00:09 --> 00:10]  My name is Mark Knopf.
[00:10 --> 00:11]  I'll be giving this.
[00:11 --> 00:12]  I'll tell you a few.
[00:12 --> 00:15]  Kick the project off.
[00:15 --> 00:16]  It's my...
[00:16 --> 00:17]  Let's see a Jennifer today.
[00:17 --> 00:19]  Well, of course, we're new to each other.
[00:19 --> 00:21]  So then I'd get acquainted first.
[00:21 --> 00:22]  So let's do that first.
[00:22 --> 00:24]  I mean, let's start with you.
[00:24 --> 00:25]  Can you introduce yourself?
[00:25 --> 00:26]  You're a marketing expert.
[00:26 --> 00:27]  Yes.
[00:27 --> 00:28]  My name is Derek.
[00:28 --> 00:30]  Derek, Derek Minecraft.
[00:30 --> 00:34]  I will be a genius project, a marketing expert.
[00:34 --> 00:42]  And I will see what the user wants and what we can do with the new project.
[00:42 --> 00:43]  Project.
[00:43 --> 00:44]  Excellent.
[00:44 --> 00:46]  Nick de Boer.
[00:46 --> 00:48]  User interface designer.
[00:48 --> 00:52]  I'm going to look at the technical design from the user point of view.
[00:52 --> 00:54]  Okay.
[00:54 --> 00:55]  My name is Sanjay Jirkens.
[00:55 --> 01:00]  I'm the industrial designer and there are three main questions that I have to find answer to.

Assistant(Output):

[00:00 --> 00:03]  Mark: Okay, good morning.
[00:03 --> 00:05]  Derek: Here's the first good evening.
[00:05 --> 00:06]  Nick: Morning.
[00:06 --> 00:07]  Sanjay: Morning.
[00:07 --> 00:09]  Mark: I'll be your project manager for today, this project.
[00:09 --> 00:10]  My name is Mark Knopf.
[00:10 --> 00:11]  I'll be giving this.
[00:11 --> 00:12]  I'll tell you a few.
[00:12 --> 00:15]  Mark: Kick the project off.
[00:15 --> 00:16]  Mark: Let's see a Jennifer today.
[00:17 --> 00:19]  Mark: Well, of course, we're new to each other.
[00:19 --> 00:21]  Mark: So then I'd get acquainted first.
[00:21 --> 00:22]  Mark: So let's do that first.
[00:22 --> 00:24]  Mark: I mean, let's start with you.
[00:24 --> 00:25]  Mark: Can you introduce yourself?
[00:25 --> 00:26]  Mark: You're a marketing expert.
[00:26 --> 00:27]  Derek: Yes.
[00:27 --> 00:28]  Derek: My name is Derek.
[00:28 --> 00:30]  Derek: Derek Minecraft.
[00:30 --> 00:34]  Derek: I will be a genius project, a marketing expert.
[00:34 --> 00:42]  Derek: And I will see what the user wants and what we can do with the new project.
[00:42 --> 00:43]  Mark: Excellent.
[00:44 --> 00:46]  Nick: Nick de Boer.
[00:46 --> 00:48]  Nick: User interface designer.
[00:48 --> 00:52]  Nick: I'm going to look at the technical design from the user point of view.
[00:52 --> 00:54]  Mark: Okay.
[00:54 --> 00:55]  Sanjay: My name is Sanjay Jirkens.
[00:55 --> 01:00]  Sanjay: I'm the industrial designer and there are three main questions that I have to find answer to.

ChipG · May 29, 2024, 3:16am

We do that now with a transcription service.

Have they hinted with their audio model will be released that can interpret who is who and transcribe an audio file directly?

Topic		Replies	Views
How to transcribe two-person interview with Whisper API? API whisper	2	5117	December 21, 2023
Can Whisper distinguish two speakers? API whisper	9	31561	August 5, 2024
Whisper, how to tag different people in (sound) conversation API api	2	7825	June 8, 2023
Whisper-1 joint translation and transcription API	6	2887	October 21, 2024
How to identify different speakers using whisper? Community whisper	3	24723	November 2, 2023

Audio File Trtanscription

Related topics