How Audio Speed Affects Transcription Accuracy: Benchmark Insights

Thanks for the amazing work!

The inconsistency is worrying. Dutch, Speed Factor: 3.1x - WER: 98.83%, but 3.3x i s 20%. So I suppose it basically is nonsense at 3.1x, but kind-of usable at 3.3x.

I wonder if there would be ways to detect the 80+% WER results, perhaps with an LLM?

1 Like