I’ve been reading quite a bit recently about Agent Swarms, specifically using the assistant API.
Theres quite alot of YT vids out there that explain how to build them, but my question is, why and what is the benefit?
For example there is a YT vid by VRSEN where he recreates AutoGen to create a chart from live stock prices.
Essentially he has a top level proxy agent and a coding agent both with their own specific system prompts.
He gives the requirement to the proxy agent which directly communicates with the coding agent, who in turn writes and executes the code needed to complete the task.
This is cool, but again, why? Isn’t this something that the assistant API on its own can handle without having multiple agents.
As cool as this is, I’m really struggling to work out any viable use cases for such a system.
Agent Swarms bask on specialization and selective context (memory and tool access). You can make specialized agents (with a narrower instruction & toolset) perform better for a use-case or set of tasks than you could if you tried to do it all with a generic agent that has access to all the tools.
But, in reality, Assistants are not ready for production use-cases. So, yes, the most common use right now are demos, PoCs, and YT clickbait. You can still build useful systems on other stacks that use the same idea of Agent Swarms for narrow use-cases.
I would say there’s one thing not in this discussion yet.
Using discrete agents for different tasks makes it substantially easier to iterate and adapt.
If you have any type of monolithic system, changing the behaviour of one piece of it can have unintended consequences on other parts.
In short, one (imperfect) way to envision the difference between an Assistant and an Agent Swarm is to think of them similar to a sequential program written without function calls and a modern modular program where each specific task is handled by a specific function.
Say you have a system that, as one part of it, writes code.
In a monolithic system anything you do to change the coding behaviour could change its performance on other tasks in unexpected ways.
By compartmentalizing the agents, you can treat them as black-box functions—the wider system doesn’t need to know or care what’s inside as long as it can send the inputs it wants and gets valid outputs in return.
This also means you have absolute freedom as to which models you use for each agent. Allowing the user of cheaper models, fine-tuned models, or even local models as necessary (or desired).
There’s some adjacent conversation in this other thread from about two months ago,
Is there a way to have access to multiple assistants in the same thread? I want to be able to choose an assistant based on the context of the conversation.
In general I would think of agent swarms as an architectural design choice one would make based on the complexity of what they are trying to achieve, their needs for easy maintainability, and other developer-centric concerns.
Many have been using this format without needing to give it a buzz word. It makes sense to separate agents/assistants/ whatever name we’re all fighting to call them next and then having some router logic to determine which to use at what time.
Personally I’ve jimmied my Assistants to switch from GPT-4 after a couple rounds of initial conversation to GPT-3.5 to reset the threshold to something financially bearable and also update the Assistant itself.