I am using Whisper API to transcribe some video lectures and conferences. It is actually working very well, even for smaller languages it is on much better level than I have seen before.
I have two questions though:
1.) Is it possible to get timecodes via API? I can’t see it in the docs, so I am guessing the answer is ‘no’, but since Whisper was offering this in open-source (install) version, it is strange that this is not the option for the paid API as well.
2.) Did anyone compare open-source version, running on the server, with the API version? How do they compare - is the API version built on a later model and is therefore more precise? I have used only “small” model as installed on my server.