Agent Swarm - What actually is the point?

Hey peeps.

I’ve been reading quite a bit recently about Agent Swarms, specifically using the assistant API.

Theres quite alot of YT vids out there that explain how to build them, but my question is, why and what is the benefit?

For example there is a YT vid by VRSEN where he recreates AutoGen to create a chart from live stock prices.
Essentially he has a top level proxy agent and a coding agent both with their own specific system prompts.
He gives the requirement to the proxy agent which directly communicates with the coding agent, who in turn writes and executes the code needed to complete the task.

This is cool, but again, why? Isn’t this something that the assistant API on its own can handle without having multiple agents.

As cool as this is, I’m really struggling to work out any viable use cases for such a system.

Am I missing something here?

1 Like

I see two benefits of using agent swarms:

  1. they assist you in leaving a carbon footprint on this planet
  2. they help you get youtube clicks because they sound cool

Overall, it seems like a sound idea: instead of a single “person”, why not spin up an entire organization?

But the problem is twofold:

  1. most single agents are half baked
  2. you’re spinning up a half baked organization of half baked agents
2 Likes

Agent Swarms bask on specialization and selective context (memory and tool access). You can make specialized agents (with a narrower instruction & toolset) perform better for a use-case or set of tasks than you could if you tried to do it all with a generic agent that has access to all the tools.

But, in reality, Assistants are not ready for production use-cases. So, yes, the most common use right now are demos, PoCs, and YT clickbait. You can still build useful systems on other stacks that use the same idea of Agent Swarms for narrow use-cases.

Both very valid answers.

Short answer: no

Slightly longer answer:

Indeed, and the main purpose of making an agent is to replace the “guiding hand” of a human(you) with an AI that may not fully understand what you’re actually asking for.

This is indeed true, and it’s usually called a MoE model (mixture of experts)

Having a swarm just implies doing multiple concurrent attempts at solving the problem :laughing:

Tldr.: it’s probably not the most cost effective way of using the API :sweat_smile:

1 Like

I would say there’s one thing not in this discussion yet.

Modularity.

Using discrete agents for different tasks makes it substantially easier to iterate and adapt.

If you have any type of monolithic system, changing the behaviour of one piece of it can have unintended consequences on other parts.

In short, one (imperfect) way to envision the difference between an Assistant and an Agent Swarm is to think of them similar to a sequential program written without function calls and a modern modular program where each specific task is handled by a specific function.

Say you have a system that, as one part of it, writes code.

In a monolithic system anything you do to change the coding behaviour could change its performance on other tasks in unexpected ways.

By compartmentalizing the agents, you can treat them as black-box functions—the wider system doesn’t need to know or care what’s inside as long as it can send the inputs it wants and gets valid outputs in return.

This also means you have absolute freedom as to which models you use for each agent. Allowing the user of cheaper models, fine-tuned models, or even local models as necessary (or desired).

There’s some adjacent conversation in this other thread from about two months ago,

In general I would think of agent swarms as an architectural design choice one would make based on the complexity of what they are trying to achieve, their needs for easy maintainability, and other developer-centric concerns.

3 Likes

100%.

Many have been using this format without needing to give it a buzz word. It makes sense to separate agents/assistants/ whatever name we’re all fighting to call them next and then having some router logic to determine which to use at what time.

Personally I’ve jimmied my Assistants to switch from GPT-4 after a couple rounds of initial conversation to GPT-3.5 to reset the threshold to something financially bearable and also update the Assistant itself.

When there is logic, there is programming :muscle:

3 Likes

I definitely think that the concept of agent swarms holds water.

As software itself is becoming ubiquitous and easy to build, so too is a “assistant” or simple “agent”. I imagine at some point there will be a DB (not in chatGPT most likely and most likely open source already), where users will have millions of free to use (just pay in tokens etc) hosted agents that can be accessed via API and thus opening up agent swarms to use as they wish.

Then whatever you use as a chatUI, you can initialize a swarm and you can have access to all these agents at the same time.

Ultimately, it will allow well built “master” agents, to use "slave’ agents to then consolidate/compile the result in a MUCH quicker manner than otherwise (as opposed to a user having to go to the GPT marketplace and choosing between 20+ “travel” based GPTs to find a flight) for example.

2 Likes