Whisper API: a) Timecodes; b) how good is open-source vs API?

nikola1jankovic · April 17, 2023, 10:41am

I am using Whisper API to transcribe some video lectures and conferences. It is actually working very well, even for smaller languages it is on much better level than I have seen before.

I have two questions though:
1.) Is it possible to get timecodes via API? I can’t see it in the docs, so I am guessing the answer is ‘no’, but since Whisper was offering this in open-source (install) version, it is strange that this is not the option for the paid API as well.

2.) Did anyone compare open-source version, running on the server, with the API version? How do they compare - is the API version built on a later model and is therefore more precise? I have used only “small” model as installed on my server.

jhveem · April 17, 2023, 7:00pm

Sorry, this isn’t really an answer, but I did spend a bit of time looking into 1 and I also couldn’t find any way to add timecodes. The locally hosted version returns several pieces with just the text being one and the timecodes beign another, and it looks like the api just returns the text portion.

nikola1jankovic · May 4, 2023, 2:18pm

I guess we will have to wait or to use the self-hosted version.

I am now looking into some possibilities of recognizing speakers, but it seems I would need to a) do that in Python first, via some library, b) send separate audio files on basis of that to the transcription. Not sure if there is more efficient way at the moment.

mailman1 · June 7, 2023, 3:26pm

You can define the format you want to have returned as body form field. If you define it as “srt”, you get back the timestamps as well

e.g. via curl you add the following parameter --form response_format=srt

Foxalabs · June 7, 2023, 7:03pm

Revised (see below for how to get timecode from the api), the advantage of using the API over self hosting is performance and model size, if you wish to run a very capable model from a low power device, offloading that computational task to a remote, enterprise class compute facility is an attractive option. If you have local compute of sufficient power then local use is fine.

lilnoes · June 7, 2023, 7:26pm

add form field, response_format=x where x is srt or verbose_json
IDK

You can also try deepgram, they offer whisper models at a cheaper price and they have diarization+timecodes

jtapiovaara · June 8, 2023, 12:12pm

supershaneski · June 8, 2023, 11:40pm

whisper api has timestamp like what the previous posters mentioned. just set the “response_format” parameter to srt or vtt.

if you are using the OpenAI Node.JS library:

const resp = await openai.createTranscription(
            fs.createReadStream("audio.mp3"), // file
           "whisper-1", // model
            "", // prompt
           "vtt", // response_format
           0.1, // temperature
           "en", // language
        )

tmokmss · July 28, 2023, 3:56am

Here’s the Python version to specify response format:

import openai

audio_file = open("/path/to/file.mp3", "rb")
transcript = openai.Audio.transcribe(
    "whisper-1", audio_file, response_format="vtt"
)

nikola1jankovic · July 28, 2023, 1:18pm

I’ve got it, thanks everyone. I think this could be used for speaker recognition as well.
a) Get timecodes
b) Paralelly to sending audio to transcription, do a speaker recognition with other tools.
c) Compare transcription timecodes and timecodes for speaker recognition.
d) Insert information in the final transcription file.

I also noticed there is a verbose json option, which sends back much more detailed timings - not only on the word level, but even on the token level. It would probably work even better for this case, but I guess it would be much more difficult to implement.

Topic		Replies	Views
How to get Whisper's API to add timestamps to the transcripts? API api , whisper	5	16834	January 29, 2024
Can Whisper distinguish two speakers? API whisper	9	38767	August 5, 2024
Whisper, how to tag different people in (sound) conversation API api	2	8608	June 8, 2023
How can I get word_timestamp? API whisper	1	3271	December 14, 2023
Whisper API & Word-Level Time-stamping API whisper	6	19640	December 14, 2023

Whisper API: a) Timecodes; b) how good is open-source vs API?

Related topics