Transcript: Amazon and Whisper merge?

johannes.txr · July 3, 2023, 1:28pm

I have many audio files where two people are speaking that need to be transcribed. I tried it with Whisper from OpenAI, which works perfectly. Unfortunately, Whisper can’t distinguish between 2 speakers.

Now, I have tried Amazon Transcribe. Amazon can distinguish speakers, but is much worse at transcribing than Whisper.

Is there any way I can “merge” the two .json files that I take the speakers from Amazon and the texts from Whisper?

Example from Amazon File:

[{"confidence":"0.6407","content":"dazu"}],"type":"pronunciation"},{"start_time":"1020.38","speaker_label":"spk_0","end_time":"1020. 93", "alternatives":[{"confidence": "1.0", "content": "tells"}], "type": "pronunciation"},{"speaker_label": "spk_0", "alternatives":[{"confidence": "0. 0","content":","}],"type":"punctuation"},{"start_time":"1020.93","speaker_label":"spk_0","end_time":"1021.23","alternatives":[{"confidence":"0. 5785","content":"dass"}],"type":"pronunciation"},{"start_time":"1021.24","speaker_label":"spk_0","end_time":"1021. 42","alternatives":[{"confidence":"0.5027","content":"das"}],"type":"pronunciation"},{"start_time":"1021.42","speaker_label":"spk_0","end_time":"1021. 64","alternatives":[{"confidence":"0.9825","content":"deine"}],"type":"pronunciation"},{"start_time":"1021. 64","speaker_label":"spk_0","end_time":"1021.91","alternatives":[{"confidence":"1.0","content":"mutter"}],"type":"pronunciation"},{"start_time":"1021. 91","speaker_label":"spk_0","end_time":"1022.22","alternatives":[{"confidence":"0.9509","content":"sagt"}],"type":"pronunciation"},
I did find someone on Github who somehow managed to do this, unfortunately I don't know a lot about programming.

Maybe you guys have some idea how to implement this.

Thanks a lot!

Foxalabs · July 3, 2023, 2:26pm

You could presumably look at turning time codes on for both services and then some fuzzy logic to match them up… might work so long as there is not much drift.

supershaneski · July 3, 2023, 11:57pm

I have seen some post about it before. I think it was a youtube video (not sure) where they discussed how they used Amazon to get the speakers and use timestamps to compare and merge the transcriptions with Whisper.

Topic		Replies	Views
Can Whisper distinguish two speakers? API whisper	9	36017	August 5, 2024
How to identify different speakers using whisper? Community whisper	3	28146	November 2, 2023
Whisper API: a) Timecodes; b) how good is open-source vs API? API whisper	9	6253	July 28, 2023
How to transcribe two-person interview with Whisper API? API whisper	2	5574	December 21, 2023
Whisper, how to tag different people in (sound) conversation API api	2	8324	June 8, 2023

Transcript: Amazon and Whisper merge?

Related topics