How to reduce latency with GPT & Unity Requests

AcertingArt · June 5, 2024, 5:08pm

Change this:
Whisper → Deepgram (nova-2 model) using websockets
TTS-1 → Elevenlabs (Eleven Turbo v2 model) using websockets with optimize_streaming_latency = 3

Use OGG for the audio format.

For GPT-4o, initially get a filler phrase repeating the last thing you asked using GPT-3.5 for faster response and lower latency. Then, obtain the full answer from GPT-4o.

Topic		Replies	Views
How can chatgpt voice response so fast? API	5	4331	May 17, 2024
GPT Assistant with Whisper Integration - Slow and Unreliable Voice-to-Voice Performance API gpt-4 , api , whisper	1	252	January 12, 2025
Open AI tts latency issues API api , assistants-api	3	443	September 29, 2025
How does ElevenLabs or Deepgram realtime voice agents work as good as OpenAI Realtime API? Community realtime	3	2984	February 26, 2025
Latency with STTTTS Pipeline API	0	157	July 2, 2025

How to reduce latency with GPT & Unity Requests

Related topics