How to reduce latency with GPT & Unity Requests

Hi, i want to develop a realtime ai assistant with unity. I’m using models below:

Whisper
GPT 4o
TTS-1

But the respond duration is too long. 5-20 secs

Is there an API for Voice Mode in GPT, are there any ways to reduce latency?
Thank you!

Change this:
Whisper → Deepgram (nova-2 model) using websockets
TTS-1 → Elevenlabs (Eleven Turbo v2 model) using websockets with optimize_streaming_latency = 3

Use OGG for the audio format.

For GPT-4o, initially get a filler phrase repeating the last thing you asked using GPT-3.5 for faster response and lower latency. Then, obtain the full answer from GPT-4o.

1 Like