How can I get word_timestamp?

Hi, I’m using whisper to get subtitles for my videos.

I’m trying to run a program using whisper on my local computer.

It took a very long time using openai-whisper library.(Maybe it’s because it’s a laptop not desktop with good GPU)

However, using whisper by using openai’s api was very fast.

import openai
transcript = openai.Audio.transcribe(model = "whisper-1", file = audio_file, response_format = 'srt', language = 'ko')
with open('transcript.srt', 'w', encoding = 'utf-8') as f:
                f.write(transcript)

made a good transcript for me in a very short time.

So, I should use this method using openai library to call Whisper.

However, the documentation of this way was very bad in contrast to openai-whisper published on github.

I couldn’t find how to do the same thing which I did with openai-whisper.

When I transcribed a mp3 file, there was a few sentences which were a little bit long.

I could handle it by passing ‘max_line_count’ to the writer when using openai-whisper.

However, I had no idea of doing the same thing with openai.Audio.transcribe.

Also, it was able to get the timestamps for every word using openai-whisper like below.

import whisper
from whisper.utils import get_writer

model = whisper.load_model('large')
whisper.DecodingOptions(language='ko')
audio = 'audio.mp3'
result = model.transcribe(audio, word_timestamps = True)

# srt_writer = get_writer("srt", output_dir = './')
# srt_writer.write_result(result, open('transcript.srt', 'w'))

I want to get the very same word-timestamp as above by calling openai API, so that I can manipulate the transcripts by myself.

Is there anyway to do it?