Super slow tool calls and long model internal deliberation times between tool calls

I’ve been developing a use case for my ChatGPT app that requires the agent chaining 2 tool calls - with the second one showing a visible widget to the user.

This use case is working well and the results are outstanding, but it’s borderline unusable due to how long the agent is taking to generate the responses.

Typical chat trace:

  • User prompt
  • Agent thinks for about 10 secs
  • Agent performs the 1st tool call (MCP server takes about 0.5s to complete the tool response)
  • Agent thinks for 20 seconds more
  • Agent performs the 2nd tool call (MCP response takes about 200ms to complete)
  • Agent takes ~5s more process the response and render the final widget.

Total time: around 35s end-to-end, with very little visual feedback for the user (especially in the native mobile app)

Any tips on how to solve this? As I stated, the final results are very good, but the slowness ruins it.

1 Like

Hey @kail, totally feel you. Agent is spending most of the time thinking and coordinating between the first tool call the second tool call and the final rendering.

It’s basically doing a multi step flow. It reads your prompt calls tool 1 thinks through the result calls tool 2 thinks again and then builds the final widget.

So even if the tools respond fast a lot of the delay comes from the agent reasoning between steps plus the UI render at the end.

You can also ping us at support@openai.com with a bit more detail and we can dig into it further.

2 Likes