Is the Realtime API actually Realtime?

moeGP · December 10, 2024, 4:45pm

I’ve created a raw WS setup with nodeJs (system prompt ~1500 characters) and this definitely does not feel REALTIME. Tested it with text only and responses are coming back at around 350-800ms. Is everyone experiencing the same here or is it my setup? i am in a us-west region btw.

moeGP · December 10, 2024, 4:49pm

best latency at ~250ms when prompt is ~150 characters (which is not really useful)

hagen.rode · December 11, 2024, 10:24am

It feels very realtime to me and I’m based in South Africa. Far away from any OpenAI servers. It’s so realtime that I’ve actually throttled back the VAD so that it doesn’t interrupt me too soon, as I’m still thinking.

"turn_detection": {
    "type": "server_vad",
    "threshold": 0.6,
    "prefix_padding_ms": 500,
    "silence_duration_ms": 2000,
},

moeGP · December 11, 2024, 7:45pm

[11:39:18.864] Assistant Event: session.created

[11:39:19.079] Assistant Event: session.updated

[11:39:19.229] Assistant Event: conversation.item.created

[11:39:19.232] Assistant Event: response.created

[11:39:19.600] Assistant Event: rate_limits.updated

[11:39:19.602] Assistant Event: response.output_item.added

[11:39:19.604] Assistant Event: conversation.item.created

[11:39:19.619] Assistant Event: response.content_part.added

[11:39:19.620] Assistant Event: response.audio_transcript.delta

[11:39:19.621] Assistant Transcription: general

[11:39:19.721] Assistant Event: response.audio.delta

[11:39:19.732] Audio delta written

[11:39:19.779] Assistant Event: response.audio.delta

[11:39:19.780] Audio delta written

[11:39:19.799] Assistant Event: response.audio.delta

[11:39:19.800] Audio delta written

[11:39:19.819] Assistant Event: response.audio.delta

[11:39:19.820] Audio delta written

[11:39:19.888] Assistant Event: response.audio.delta

[11:39:19.889] Audio delta written

[11:39:19.914] Assistant Event: response.audio.delta

[11:39:19.915] Audio delta written

[11:39:19.919] Assistant Event: response.audio.done

[11:39:19.920] Assistant Event: response.audio_transcript.done

[11:39:19.930] Assistant Event: response.content_part.done

[11:39:19.932] Assistant Event: response.output_item.done

[11:39:19.933] Assistant Event: response.done

Speech feels different. Humans are slow and expect an assistant to be at around their pace. Those number reflect actual events timing, which clearly shows how soon data flows back from the server.

When it’s the case of text processing that is tied up with other sequential tasks, a 100ms makes a big difference. So, when’s averaging above 350ms, that’s considered slow. This API is more on the soft Realtime and a bit of non-real time for convoluted requests.

Topic		Replies	Views
Assistant API Performance is Very Slow API plugin-development , api	10	5170	March 7, 2024
Realtime API nerfed vs Advanced Voice Mode? Feedback realtime	10	1693	February 11, 2025
Realtime api has high latency occasionally API realtime , api-realtime	0	71	February 25, 2025
8-12 Seconds Response Delay with OpenAI API Using Node.js and WhatsApp API API api	2	186	January 15, 2025
Challenges in Multilingual Understanding with Realtime APIs Feedback bug , realtime , api-realtime	3	693	October 25, 2024

Is the Realtime API actually Realtime?

Related topics