You’re misrepresenting things a bit: the “main purpose” of making an agent is the same purpose of any dumb automation: it handles simple tasks quickly and tirelessly. All a smart thermostat does is adjust the thermostat; but it’s worth automating because a smart thermostat takes a simple task and makes it go away ~forever.
The goal with agents is to take simple tasks and make them go away. You might set up an agent to monitor an inbox, reply to anything that’s simple, and elevate [to a human] anything that’s not trivial: you’ve taken the trivial tasks and made them go away ~forever, leaving you with only “interesting work” (I mean, it’s still an email inbox, but).
The AI might not fully understand what you’re asking for. A personal assistant doesn’t always understand what you’re asking for, either. But the point is to offload easy stuff so you can focus your effort on hard stuff. How you leverage your personal assistant is still a skill issue (“Hey, can you give the shareholder presentation?” maybe just isn’t something your PA should do), and I’m sure many people will fail because they asked their AI assistant to do something that it couldn’t handle – but this is already the case!
I just don’t see why you’re dismissing this as trivial. I’d argue that most people spend most of their time doing “necessary bullshit” that an 8B model could handle just fine. Smart thermostats are great!
What you’re getting at, I think, is that right now, agents often fail in strange and unexpected ways when put into agentic loops (I currently can’t describe the phenomenon better than saying “it’s, uh, funky”, and if you’ve tried building agents before, I welcome you to try to explain the problem better than “why is this so cursed”). There’s a bunch of random tail risk that you really wouldn’t expect from e.g. putting any grad student on a task. A lot of, “oh, weird, that didn’t work and I can’t explain why”.
But, this technology is the worst it’s ever getting. There are a lot of people working on agents for obvious reasons, and one day they’ll have their iPhone moment: someone will ship an agentic system, and it’ll Just Work. Every time the base model gets a little smarter, the “funky tail risk” problem gets a little better; and one day, it’ll just be Good Enough. Meanwhile, people are making iterative progress on frameworks (from “just let it talk to itself” to “add a persistent output area” to “what if we let it prompt other agents and bootstrap an entire organization for any given problem”). Devin was a breakthrough, even if it fell short in the end: we see some of its DNA in Anthropic’s products, and I expect it’s just the shape of things to come.
Now, couple things. A MoE model isn’t what you think it is: it doesn’t divide the model into multiple agents. Rather, queries are routed to different parts of the same model, in a way that has absolutely no bearing on task-specificity. I’ve been saying for a while that “mixture of experts” is a terrible name, because it makes people think that, “oh, this part is more knowledgeable about some field, this part is more knowledgeable on some other field, etc”; but that’s simply not how it works. Nothing in a MoE model is even loosely associated with peoples’ usual concept of “expertise”.
What OP’s asking about is agent swarms, where the same model (usually) plays multiple different “characters”, which are closer to “experts” in the traditional sense (e.g. one might be a programmer, or a manager, or a client liaison). This has a lot of benefits:
-
Each agent can have its own system prompt, which meaningfully impacts how they interact.
-
Each agent can have its own context window (they don’t all need to be aware of everything the entire org is doing), which also increases performance (in most cases, performance very notably goes off a cliff when context saturates).
-
Modularity. Breaking your system into parts means you don’t have to look at the entire system every time something breaks. If your car won’t start, you can look at the starter. If your agent swarm’s code doesn’t compile, maybe look at the programmer. It’s not always the starter, and it won’t always be the programmer responsible for bad code, but at least you have a starting point.
Swarms, if designed right, shouldn’t be trying to take multiple concurrent attempts at the same problem (that sounds more like a Mixture of Models architecture, which is a yet-different thing): they should hand off parts of the problem to be solved.
You want to get your hair cut, so you walk into the store, talk to a receptionist, then a barber, maybe a cashier at the end. You could argue that everyone there is “solving the same problem”: at the end of the day, people want to trade money for haircuts, and that’s all anyone in the org does. But, at the same time: clearly, a receptionist and a barber are not “doing the same thing”, that’d be silly.