Impossible to get basic transcription with API

Hi,

I just want a pretty basic transcription. I am using the Whisper OpenAI API.

I need short segments (~4 words) with timings & punctuations.
What I went through:

  • API doesn’t allow to set the number of words per segment
  • Thought I could build it from words level transcribe → there is no punctuation there, also characters like - and ' are fucked up (explaining more later)
  • Thought I could merge text or segments with words, parsing both, getting full word this time (’ and -) and the punctuation.

Until I noticed a few things between text/segments and words:

  • Text might differ. Literally having word in words that totally not exist in text/segments
  • Timestamps is a big mismatch (saw a few post about it)
  • No punctuation in words
  • Words containing ' or - like it's in some language it would be consider as one word, in other language as two words

This makes merging segments and words difficult since there is not the same amount of words in both side and rules on specific characters differ depending on the language

Does anyone succeed getting a basic word based transcribe with punctuation and level timestamp with the API ?

Thank you

1 Like