Latency with STTTTS Pipeline

I’m using gpt-4o-transcribe > gpt-4.1 > gpt-4o-mini-tts. However it takes like 3-5 seconds. The response is great, but the user experience is pretty poor due to the latency.
Is there a way to reduce the latency?
I know there’s a realtime audio, but it’s too expensive.