Hi guys!

Would like to know if there’s any way to reduce the latency of whisper API response. As of now to transcribe 20 seconds of speech it is taking 5 seconds which is crazy high. Is there any way to get it to 2-3 seconds atleast? Can we expect OpenAI to improve latency overtime?

Because most application of STT would require it to be close to real-time so that would be highly appreciated!

Whisper isn’t architected in a way suited for realtime transcription, to achieve what you want, you will have to break up the request into small chunks and transcribe each chunk.

1 Like