Realtime-Twillio Jittery Voice

We are using Realtime (Both Big and mini models) with Twilio. The voice on Twilio is:

  • Jittery
  • Skips part of a word

Has anyone else felt this issue? It looks like that buffer is not being handled properly but we are using all the setting that OpenAI/Twilio has given.

Yup , we had this issue early on. In our case the jitter and pause issue was caused by how audio frames were being buffered and pushed to Twilio. OpenAI’s realtime audio arrives in bursts, and Twilio expects steady pacing. Once we added a server side buffer instead of forwarding frames as soon as they arrive, the audio became stable.

Very helpful. Can you please provide some link to repo? I am not an engineer and need to ask my engineering team.

We don’t have a repo, but the fix is quite small and easy to explain.

What caused our issue-

Forwarding audio immediately (causes jitter)

async for frame in openai_audio_stream:
await twilio_ws.send(frame)

What fixed it-

import asyncio
from collections import deque

audio_buffer = deque()

async def receive_openai_audio(openai_stream):
async for frame in openai_stream:
audio_buffer.append(frame)

async def send_audio_to_twilio(twilio_ws):
FRAME_DURATION = 0.02 # ~20ms pacing
while True:
if audio_buffer:
await twilio_ws.send(audio_buffer.popleft())
await asyncio.sleep(FRAME_DURATION)

await asyncio.gather(
receive_openai_audio(openai_stream),
send_audio_to_twilio(twilio_ws),
)

The buffering and pacing layer was the only change needed.
Twilio’s media streams docs might also be of some help if not already viewed.

Hopefully this helps.

1 Like