I’d love to get some advice on what some recommended approaches are for architecting my first multi-agent system. I’m building an agent for my iOS app that has access to a bunch of tools, most importantly it can fetch context it needs from my DB via raw SQL queries. It need it to detect when the users request is incomplete and ask for clarification. Lastly it outputs structured JSON responses that my app can parse and turn into UI state.
The system I came up with works but is incredibly slow. I’m currently taking the users request and running it through my first LLM call which is a planner that generates a step by step plan for my tool caller to execute. (There’s so much to know about generating a coherent plan that I separated it from the tool calling agent ) The tool caller goes step by step, fetches data it needs, stops to ask the user for clarification when needed and gathers all the context needed to feed into my final responder LLM that will be the user facing structured output.
Some requests are taking 15+ seconds
What’s nice about my approach is it’s not recursive like the Agent SDK and therefore I have more control over the cost and token usage. I believe I’m effectively doing everything the Agent SDK is doing anyway, just manually.
I have yet to put in the work to optimize the latency and am not even streaming yet. (Due to the annoying work of streaming structured JSON output effectively) I’m planning on streaming both the planner response to be able to execute steps sooner, as well as stream the final user-facing structured output. I’m also gonna look into parallelizing tool calls.
I’m wondering if I’m on the right track with approach or if I should just switch to the Agent SDK.
Is there something I’m missing that would drastically reduce latency? (I’m using Gpt4o for all llms)