How can chatgpt voice response so fast?

My main reason to not to use whisper is actually that is not made for real time interaction (streaming). Right now (lets see in the near future with the voice input in GPT-o) you should check other STT services. For me the best one (having in mind reliability, languages,speed and cost) is Amazon Transcribe. I tried already lots of them. Actually, the answer from GPT-o is really fast. About last step TTS generation, I also tried everything available in the market, and my current choice is Google Cloud for TTS, is pretty fast, not expensive and kind of ok with the quality for my case. Right now I achieve a kind of natural feeling of conversation regarding timing.