I posted a link Kruel.ai was originally started on this open source. I still to this day use parts of this the ball input/output was the best. these guys were the first to get me into ai development.
thanks Luka_Spahija
To fix the pause for a seconds I built a buffer system that takes the input converts it stores and waits for 1 seconds to see if any more show up and if so append and repeat if nothing in 1 sec send for processing.
easy logic works great. If you want to take it further you can use machine learning for patterns overtime so it can learn the speakers way of talking to optimize timing which can reduce delay if you need it as close as possible to reduce over all time