Whisper-Live ASR for live streaming transcription results in 8-10 sec delay. how to improve this?

srathimalar23 · June 19, 2024, 9:32am

0

I did a POC on whisper-live with medium model where I am facing 8-10 seconds delay in transcription and the captions are getting printed as paragraph. I am using 16GB RAM, 12th Gen intel(R) core processor windows PC. My requirement is to get the accurate transcription with low latency maximum up to 4 seconds and the captions should getting printed as word by word. is this possible with high performance VM configuration and whisper large-v2 model ?

I am expecting any proven record of Whisper for live streaming with great accuracy and max of 4 sec delay in transcription. Expecting the VM configuration which optimizes the performance, transcription delay in seconds, is word by word transcription is attainable using whisper large-V2 model ?

Chris60 · July 22, 2024, 1:12am

Hey! Were you able to solve this? I’m having the same issue and I’ve not been able to find any solution yet.

Topic		Replies	Views
Whisper API Latency is just too high! API whisper	2	4662	December 25, 2023
How to reduce Latency for realtime conversation using whisper API	1	1473	June 22, 2024
Whisper API - subtitle timecodes out of sync API whisper	2	1923	January 17, 2024
Whisper latency: 4 words sentences take over 3 seconds API whisper	1	325	November 11, 2024
Whisper API for Hindi Speech to Text API whisper	3	881	March 5, 2025

Whisper-Live ASR for live streaming transcription results in 8-10 sec delay. how to improve this?

Related topics