When we create transcription using Whisper API we encountered weird error. Sometimes (handful times in an hour of audio) there is skipped sentence. Timing of previous and next sentence is adjusted to cover missing sentence without gap. Previous sentence is wrongly timed few seconds longer. Following sentence starts a few seconds earlier. When we run same file again errors appear on different places.
This skipping happens to me quite often and usually when the speaker I am transcribing is quoting something, almost as if Whisper is avoiding potential plagiarism or copyright or some such thing.
I’ve been using the Whisper API for some time, and I’ve noticed that it’s been acting “lazy.” It’s skipping important parts of the transcription, which didn’t happen before (I tested it on a model installed on my local machine, and the transcription is perfect, with 100% success in the transcription).
Furthermore, it seems to be random because if I try to transcribe the same audio file again, sometimes it transcribes the part it couldn’t transcribe in the previous attempt.
I transcribe phone calls, so I believe it wouldn’t fall under copyright issues.
Yes, I get this issue too. Just noticed it recently that some sentences are being dropped randomly within the middle of a longer transcription. This is a real shame because it puts into doubt the quality of any transcription. A workaround for now, can be to use the phone apps built-in voice transcription services instead of using openAI apps transcription button. Or for pre-recorded content use otter.ai
I have this same issue. The issue appears to happen the most often with medium. large-v3-turbo is performing better.
The audio files are about 1 hour long, and the portions that are skipped are when someone is reading from a book (basically a quotation) that is in the public domain in my country. The worst was one portion where it skipped 199 consecutive words (a 60+ second portion ).
The skipped portions are randomly placed when using the same model, however, they are always portions where the person is reading.