Hi everyone,
I’m currently using Agent Builder with ChatKit integrated into my own application.
Overall, everything works correctly from a functional point of view.
However, I’m experiencing very high latency, and I’m trying to understand whether this is expected behavior or if others are seeing the same thing.
Here’s what I observe:
-
The first user message almost always takes ~20 seconds before receiving the first token / response.
-
Subsequent messages are a bit faster, but still around 8–10 seconds, even in very simple scenarios.
-
This happens regardless of the model used.
-
This happens with or without tools:
-
No tools
-
No complex instructions
-
No workflows
-
No function calls
→ Latency remains roughly the same.
-
-
Adding tools or workflows does not noticeably increase latency, which suggests the baseline overhead is already high.
This makes the UX quite difficult for real-time or near-real-time chat use cases.
My main questions are:
-
Is this level of latency expected when using ChatKit with Agent Builder?
-
Is there known initialization or orchestration overhead for agents that explains the ~20s delay on the first message?
-
Are there recommended optimizations or best practices to reduce this latency?
-
Or am I simply misconfigured / doing something wrong?
I’d really appreciate feedback from anyone using ChatKit + Agent Builder in production, or from the OpenAI team if this is a known limitation.
Thanks in advance!