Whisper API & Word-Level Time-stamping

Hey all!

I am a Node developer looking into adding word-level time-stamping into a transcription tool I am building. Is this possible with Whisper, currently? And if so, is it specifically possible with Node?

Thank you in advance!

The API has no timestamp at this moment but you can use this: whisper-timestamped

Ah thank you for this!! Hopefully Whisper will have it natively soon, but this looks promising.

Perhaps this is a dumb question, but is there a way to easily implement this into my Node web application? I have no Python experience.

Alternatively you can use Whisper from replicate.

Output example: (not sure if this is what you want)
image

Replicate DOES NOT produce word level segments.

I find using replicate for whisper a complete waste of time and money. You could get the same results from just whisper from open ai package. On the response type, mention you want vtt, srt or verbose_json.

But be aware. Whisper from Open AI or from Replicate does NOT produce word level time stamps as of today.

Play around replicate before you actually implement it.

1 Like

Hi!
Was wondering what is the difference between the implemented solution and the online one you shared?
Could Whisper still function in an offline environment?

Thank you!