Very high latency with ChatKit + Agent Builder (even without tools) – is it just me?

Hi everyone,

I’m currently using Agent Builder with ChatKit integrated into my own application.
Overall, everything works correctly from a functional point of view.

However, I’m experiencing very high latency, and I’m trying to understand whether this is expected behavior or if others are seeing the same thing.

Here’s what I observe:

  • The first user message almost always takes ~20 seconds before receiving the first token / response.

  • Subsequent messages are a bit faster, but still around 8–10 seconds, even in very simple scenarios.

  • This happens regardless of the model used.

  • This happens with or without tools:

    • No tools

    • No complex instructions

    • No workflows

    • No function calls
      → Latency remains roughly the same.

  • Adding tools or workflows does not noticeably increase latency, which suggests the baseline overhead is already high.

This makes the UX quite difficult for real-time or near-real-time chat use cases.

My main questions are:

  • Is this level of latency expected when using ChatKit with Agent Builder?

  • Is there known initialization or orchestration overhead for agents that explains the ~20s delay on the first message?

  • Are there recommended optimizations or best practices to reduce this latency?

  • Or am I simply misconfigured / doing something wrong?

I’d really appreciate feedback from anyone using ChatKit + Agent Builder in production, or from the OpenAI team if this is a known limitation.

Thanks in advance!

What you’re seeing is largely expected behavior when using ChatKit with Agent Builder and is usually not a misconfiguration.
Agent Builder introduces a non-trivial orchestration and initialization overhead (agent setup, instruction loading, decision layer, context preparation) that occurs on the first message, even when no tools, workflows, or complex instructions are involved. This explains the ~20s delay on the initial turn and the still-elevated latency on subsequent turns.

Because this overhead is largely model-agnostic, changing models or disabling tools does not significantly affect latency. ChatKit + Agent Builder is therefore better suited for complex, agentic workflows rather than low-latency, real-time chat UX. For latency-sensitive use cases, calling the model directly without Agent Builder is currently the more appropriate approach.

Thank you for your reply — that makes things much clearer now.

The point is that I really like the integration of Chatkit within my application. I was wondering whether it’s currently possible to use the Responses API directly with Chatkit without going through the agentbuilder system.

I’ve looked through the documentation and it seems like it might be possible, but it’s not entirely clear to me how this is intended to work in practice, or whether the agent layer is still required in this case.

Yes, this is something I was stuck on as well. I really just wanted to have an API endpoint I could use with my agent/workflow being hosted on openAI’s site and just retrieve the output given a certain input.

The only way it seems to be supported is through chatbot, or just simply taking the code that it creates for you and mimicking the workflow on your server. That was the easiest way to do it, considering a workflow is just a bunch of prompt chaining/rag/mcp etc.

+1 here, takes forever on more complex setups; could a dev please look into this?