Hi, i want to develop a realtime ai assistant with unity. I’m using models below:
Whisper
GPT 4o
TTS-1
But the respond duration is too long. 5-20 secs
Is there an API for Voice Mode in GPT, are there any ways to reduce latency?
Thank you!
Hi, i want to develop a realtime ai assistant with unity. I’m using models below:
Whisper
GPT 4o
TTS-1
But the respond duration is too long. 5-20 secs
Is there an API for Voice Mode in GPT, are there any ways to reduce latency?
Thank you!
Change this:
Whisper → Deepgram (nova-2 model) using websockets
TTS-1 → Elevenlabs (Eleven Turbo v2 model) using websockets with optimize_streaming_latency = 3
Use OGG for the audio format.
For GPT-4o, initially get a filler phrase repeating the last thing you asked using GPT-3.5 for faster response and lower latency. Then, obtain the full answer from GPT-4o.
I’m interested in this too @Unity. Did you try the solution from @AcertingArt?
Also @AcertingArt: are there any reasons why you expect Deepgram or Elevenlabs to be faster? I have not investigated those.