I’ve been developing a use case for my ChatGPT app that requires the agent chaining 2 tool calls - with the second one showing a visible widget to the user.
This use case is working well and the results are outstanding, but it’s borderline unusable due to how long the agent is taking to generate the responses.
Typical chat trace:
- User prompt
- Agent thinks for about 10 secs
- Agent performs the 1st tool call (MCP server takes about 0.5s to complete the tool response)
- Agent thinks for 20 seconds more
- Agent performs the 2nd tool call (MCP response takes about 200ms to complete)
- Agent takes ~5s more process the response and render the final widget.
Total time: around 35s end-to-end, with very little visual feedback for the user (especially in the native mobile app)
Any tips on how to solve this? As I stated, the final results are very good, but the slowness ruins it.