Whisper API - subtitle timecodes out of sync

I am using Whisper API to transcribe texts and it works well, even with smaller languages. However, I am having problems with transcribing subtitles, as it will happen relatively frequently, that subtitles will go out of sync in some videos. I have noticed that English transcriptions might fare better.

Did anyone have experience with this and can confirm this problem exists? Is there a way to solve it?

I got into the same issue with an English Transcription, with relatively small portions of the transcript up to 8 seconds out of sync.

The same audio is transcripted with proper timecodes using the huggingface pipeline with whisper large models (tried with v1, v2 and v3). The only cons is that with the huggingface pipeline I am getting a worse wer than with the whisper API.

Has anyone managed to improve the English transcription? I’m also having issues, unfortunately in situations where there are long pauses in the audio, the tc in and tc out go out of sync and it gets worse over time.