I would say there’s one thing not in this discussion yet.
Modularity.
Using discrete agents for different tasks makes it substantially easier to iterate and adapt.
If you have any type of monolithic system, changing the behaviour of one piece of it can have unintended consequences on other parts.
In short, one (imperfect) way to envision the difference between an Assistant and an Agent Swarm is to think of them similar to a sequential program written without function calls and a modern modular program where each specific task is handled by a specific function.
Say you have a system that, as one part of it, writes code.
In a monolithic system anything you do to change the coding behaviour could change its performance on other tasks in unexpected ways.
By compartmentalizing the agents, you can treat them as black-box functions—the wider system doesn’t need to know or care what’s inside as long as it can send the inputs it wants and gets valid outputs in return.
This also means you have absolute freedom as to which models you use for each agent. Allowing the user of cheaper models, fine-tuned models, or even local models as necessary (or desired).
There’s some adjacent conversation in this other thread from about two months ago,
Is there a way to have access to multiple assistants in the same thread? I want to be able to choose an assistant based on the context of the conversation.
In general I would think of agent swarms as an architectural design choice one would make based on the complexity of what they are trying to achieve, their needs for easy maintainability, and other developer-centric concerns.