Agent Swarm - What actually is the point?

Hey peeps.

I’ve been reading quite a bit recently about Agent Swarms, specifically using the assistant API.

Theres quite alot of YT vids out there that explain how to build them, but my question is, why and what is the benefit?

For example there is a YT vid by VRSEN where he recreates AutoGen to create a chart from live stock prices.
Essentially he has a top level proxy agent and a coding agent both with their own specific system prompts.
He gives the requirement to the proxy agent which directly communicates with the coding agent, who in turn writes and executes the code needed to complete the task.

This is cool, but again, why? Isn’t this something that the assistant API on its own can handle without having multiple agents.

As cool as this is, I’m really struggling to work out any viable use cases for such a system.

Am I missing something here?

1 Like

I see two benefits of using agent swarms:

  1. they assist you in leaving a carbon footprint on this planet
  2. they help you get youtube clicks because they sound cool

Overall, it seems like a sound idea: instead of a single “person”, why not spin up an entire organization?

But the problem is twofold:

  1. most single agents are half baked
  2. you’re spinning up a half baked organization of half baked agents
3 Likes

Agent Swarms bask on specialization and selective context (memory and tool access). You can make specialized agents (with a narrower instruction & toolset) perform better for a use-case or set of tasks than you could if you tried to do it all with a generic agent that has access to all the tools.

But, in reality, Assistants are not ready for production use-cases. So, yes, the most common use right now are demos, PoCs, and YT clickbait. You can still build useful systems on other stacks that use the same idea of Agent Swarms for narrow use-cases.

Both very valid answers.

Short answer: no

Slightly longer answer:

Indeed, and the main purpose of making an agent is to replace the “guiding hand” of a human(you) with an AI that may not fully understand what you’re actually asking for.

This is indeed true, and it’s usually called a MoE model (mixture of experts)

Having a swarm just implies doing multiple concurrent attempts at solving the problem :laughing:

Tldr.: it’s probably not the most cost effective way of using the API :sweat_smile:

1 Like

I would say there’s one thing not in this discussion yet.

Modularity.

Using discrete agents for different tasks makes it substantially easier to iterate and adapt.

If you have any type of monolithic system, changing the behaviour of one piece of it can have unintended consequences on other parts.

In short, one (imperfect) way to envision the difference between an Assistant and an Agent Swarm is to think of them similar to a sequential program written without function calls and a modern modular program where each specific task is handled by a specific function.

Say you have a system that, as one part of it, writes code.

In a monolithic system anything you do to change the coding behaviour could change its performance on other tasks in unexpected ways.

By compartmentalizing the agents, you can treat them as black-box functions—the wider system doesn’t need to know or care what’s inside as long as it can send the inputs it wants and gets valid outputs in return.

This also means you have absolute freedom as to which models you use for each agent. Allowing the user of cheaper models, fine-tuned models, or even local models as necessary (or desired).

There’s some adjacent conversation in this other thread from about two months ago,

In general I would think of agent swarms as an architectural design choice one would make based on the complexity of what they are trying to achieve, their needs for easy maintainability, and other developer-centric concerns.

4 Likes

100%.

Many have been using this format without needing to give it a buzz word. It makes sense to separate agents/assistants/ whatever name we’re all fighting to call them next and then having some router logic to determine which to use at what time.

Personally I’ve jimmied my Assistants to switch from GPT-4 after a couple rounds of initial conversation to GPT-3.5 to reset the threshold to something financially bearable and also update the Assistant itself.

When there is logic, there is programming :muscle:

3 Likes

I definitely think that the concept of agent swarms holds water.

As software itself is becoming ubiquitous and easy to build, so too is a “assistant” or simple “agent”. I imagine at some point there will be a DB (not in chatGPT most likely and most likely open source already), where users will have millions of free to use (just pay in tokens etc) hosted agents that can be accessed via API and thus opening up agent swarms to use as they wish.

Then whatever you use as a chatUI, you can initialize a swarm and you can have access to all these agents at the same time.

Ultimately, it will allow well built “master” agents, to use "slave’ agents to then consolidate/compile the result in a MUCH quicker manner than otherwise (as opposed to a user having to go to the GPT marketplace and choosing between 20+ “travel” based GPTs to find a flight) for example.

2 Likes

You’re misrepresenting things a bit: the “main purpose” of making an agent is the same purpose of any dumb automation: it handles simple tasks quickly and tirelessly. All a smart thermostat does is adjust the thermostat; but it’s worth automating because a smart thermostat takes a simple task and makes it go away ~forever.

The goal with agents is to take simple tasks and make them go away. You might set up an agent to monitor an inbox, reply to anything that’s simple, and elevate [to a human] anything that’s not trivial: you’ve taken the trivial tasks and made them go away ~forever, leaving you with only “interesting work” (I mean, it’s still an email inbox, but).

The AI might not fully understand what you’re asking for. A personal assistant doesn’t always understand what you’re asking for, either. But the point is to offload easy stuff so you can focus your effort on hard stuff. How you leverage your personal assistant is still a skill issue (“Hey, can you give the shareholder presentation?” maybe just isn’t something your PA should do), and I’m sure many people will fail because they asked their AI assistant to do something that it couldn’t handle – but this is already the case!

I just don’t see why you’re dismissing this as trivial. I’d argue that most people spend most of their time doing “necessary bullshit” that an 8B model could handle just fine. Smart thermostats are great!


What you’re getting at, I think, is that right now, agents often fail in strange and unexpected ways when put into agentic loops (I currently can’t describe the phenomenon better than saying “it’s, uh, funky”, and if you’ve tried building agents before, I welcome you to try to explain the problem better than “why is this so cursed”). There’s a bunch of random tail risk that you really wouldn’t expect from e.g. putting any grad student on a task. A lot of, “oh, weird, that didn’t work and I can’t explain why”.

But, this technology is the worst it’s ever getting. There are a lot of people working on agents for obvious reasons, and one day they’ll have their iPhone moment: someone will ship an agentic system, and it’ll Just Work. Every time the base model gets a little smarter, the “funky tail risk” problem gets a little better; and one day, it’ll just be Good Enough. Meanwhile, people are making iterative progress on frameworks (from “just let it talk to itself” to “add a persistent output area” to “what if we let it prompt other agents and bootstrap an entire organization for any given problem”). Devin was a breakthrough, even if it fell short in the end: we see some of its DNA in Anthropic’s products, and I expect it’s just the shape of things to come.


Now, couple things. A MoE model isn’t what you think it is: it doesn’t divide the model into multiple agents. Rather, queries are routed to different parts of the same model, in a way that has absolutely no bearing on task-specificity. I’ve been saying for a while that “mixture of experts” is a terrible name, because it makes people think that, “oh, this part is more knowledgeable about some field, this part is more knowledgeable on some other field, etc”; but that’s simply not how it works. Nothing in a MoE model is even loosely associated with peoples’ usual concept of “expertise”.

What OP’s asking about is agent swarms, where the same model (usually) plays multiple different “characters”, which are closer to “experts” in the traditional sense (e.g. one might be a programmer, or a manager, or a client liaison). This has a lot of benefits:

  1. Each agent can have its own system prompt, which meaningfully impacts how they interact.

  2. Each agent can have its own context window (they don’t all need to be aware of everything the entire org is doing), which also increases performance (in most cases, performance very notably goes off a cliff when context saturates).

  3. Modularity. Breaking your system into parts means you don’t have to look at the entire system every time something breaks. If your car won’t start, you can look at the starter. If your agent swarm’s code doesn’t compile, maybe look at the programmer. It’s not always the starter, and it won’t always be the programmer responsible for bad code, but at least you have a starting point.

Swarms, if designed right, shouldn’t be trying to take multiple concurrent attempts at the same problem (that sounds more like a Mixture of Models architecture, which is a yet-different thing): they should hand off parts of the problem to be solved.

You want to get your hair cut, so you walk into the store, talk to a receptionist, then a barber, maybe a cashier at the end. You could argue that everyone there is “solving the same problem”: at the end of the day, people want to trade money for haircuts, and that’s all anyone in the org does. But, at the same time: clearly, a receptionist and a barber are not “doing the same thing”, that’d be silly.

1 Like