How to create a (near) realtime Speech-to-Text using Whisper?

I would like to create an app that does (near) realtime Speech-to-Text, so I would like to use Whisper for that.

I tested with ‘raw’ Whisper but the delay to return the response was quite large, I’d like to have a guidance what is the best way of doing that, some tutorials that I tried I got a lot of errors.

1 Like