Hi, I’m using whisper to get subtitles for my videos.
I’m trying to run a program using whisper on my local computer.
It took a very long time using openai-whisper library.(Maybe it’s because it’s a laptop not desktop with good GPU)
However, using whisper by using openai’s api was very fast.
import openai
transcript = openai.Audio.transcribe(model = "whisper-1", file = audio_file, response_format = 'srt', language = 'ko')
with open('transcript.srt', 'w', encoding = 'utf-8') as f:
f.write(transcript)
made a good transcript for me in a very short time.
So, I should use this method using openai library to call Whisper.
However, the documentation of this way was very bad in contrast to openai-whisper published on github.
I couldn’t find how to do the same thing which I did with openai-whisper.
When I transcribed a mp3 file, there was a few sentences which were a little bit long.
I could handle it by passing ‘max_line_count’ to the writer when using openai-whisper.
However, I had no idea of doing the same thing with openai.Audio.transcribe.
Also, it was able to get the timestamps for every word using openai-whisper like below.
import whisper
from whisper.utils import get_writer
model = whisper.load_model('large')
whisper.DecodingOptions(language='ko')
audio = 'audio.mp3'
result = model.transcribe(audio, word_timestamps = True)
# srt_writer = get_writer("srt", output_dir = './')
# srt_writer.write_result(result, open('transcript.srt', 'w'))
I want to get the very same word-timestamp as above by calling openai API, so that I can manipulate the transcripts by myself.
Is there anyway to do it?