Many times into SRT appears strange lines:
“Sottotitoli e revisione a cura di QTSS”
“Subs by www.zeoranger.co.uk Diolch yn fawr iawn am wylio’r fideo.”
“Sottotitoli creati dalla comunità Amara.org”
“Sottotitoli a cura di QTSS”
“Sous-titres réalisés par la communauté d’Amara.org”
“info un libro pubblico su www.mesmerism.info”
etc.
510
00:38:51,080 --> 00:38:53,080
Sottotitoli e revisione a cura di QTSS
511
00:39:21,080 --> 00:39:23,080
Sottotitoli e revisione a cura di QTSS
512
00:39:51,080 --> 00:39:53,080
Sottotitoli e revisione a cura di QTSS
in this case, this line appear from 510 to 797… 288 times
there is a way to block this lines?
Hello everyone, I’ve also encountered the same issue with repetitive lines in transciptions API mentioning QTSS and Amara org. Has anyone found a solution to block these automatic additions?
Whisper may hallucinate on empty, or almost empty sections of audio, to prevent this you will need to trim away the sections of your audio that doesn’t contain speech.
Another solution is to remove the text you don’t want in your transcript afterwards, it’s not a perfect solution, but it does work. Here’s a list of common hallucinations from GitHub