Micro Agents: creating little agents that do one thing really really well

Wanted to share idea that’s at the heart of a new OSS library I’m building called AgentM. I say library because the last thing we need is another Agent framework. The intent of AgentM is to create a library of useful functions that I call “Micro Agents”. Using AgentM you can easily add intelligence to any application. It’s intended to be the AI equivalent of a library like Lodash.

To illustrate the idea here’s a short example that uses AgentM’s reduceList() function (Micro Agent) to sum a couple of columns in shopping cart:

import { openai, reduceList } from "agentm";
import * as dotenv from "dotenv";

// Load environment variables from .env file
dotenv.config();

// Initialize OpenAI
const apiKey = process.env.apiKey!;
const model = 'gpt-4o-2024-08-06';
const completePrompt = openai({ apiKey, model });

// Create cancellation token
const shouldContinue = () => true;

// Mock up shopping cart data
const list = [
    { description: 'graphic tee', quantity: 2, unit_price: 19.95, total: 39.90 },
    { description: 'jeans', quantity: 1, unit_price: 59.95, total: 59.95 },
    { description: 'sneakers', quantity: 1, unit_price: 79.95, total: 79.95 },
    { description: 'jacket', quantity: 1, unit_price: 99.95, total: 99.95 }
];

// Sum up the total quantity and price
const goal = `Sum the quantity and total columns.`;
const initialValue = { quantity: 0, total: 0 };
reduceList({goal, list, initialValue, completePrompt, shouldContinue }).then(result => {;
    if (result.completed) {
        console.log(result.value);
    } else {
        console.error(result.error);
    }
});

The output of that call is the correct value of { quantity: 5, total: 279.75 }. While that’s not an overly practical example (you don’t need AI for that), what it represents is an attempt to solve the task of counting using first principles that are very similar to what humans do. Humans count things by working our way through a list one item at a time and accumulating the individual values along the way. The reduceList() micro agent is counting the same way. It takes the list and evaluates each item one-by-one creating a chain-of-thought to help keep it on track. To see what this enables I have another example here which takes a list of orders as input and counts the number orders where the customer purchased a complete outfit consisting of at least a shirt and pair of pants. You could probably do that in code as well but it’s trivial to get AI to perform tasks like this. While it’s true that LLM’s generally can’t count, Micro Agents can.

When I started thinking about this last week I quickly realized that I can actually implement pretty much any standard data structure algorithm as a micro agent. I have reduce implemented and I have the prompts all fleshed out for map, filter, and sort (merge sort.) Plus all of the standard gen AI stuff like summarize and classify. The sort algorithm is interesting because you can give it a list of historical events that are in random order and it will correctly sort them to be in chronological order (even using gpt-4o-mini.) And since these are all well known algorithms they’re all computationally efficient. You could use the sort micro agent to sort a list of 10,000 items and it will take O (n log n) time but never consume more then 1k tokens per step. Plus, many of the algorithms can be run in parallel for added speed ups.

Anyway, this is very early work but thought I’d share the idea for input while I’m actively in the process of coding everything…

3 Likes

My long term vision for this is that all of these micro agents become reliable tools that a larger Agent can leverage to more reliably perform tasks. If a scientist agent is researching protein folding it can use the reduceList micro agent to reliably count the number of stable proteins an experiment created. It’s just simple counting but you need human level intelligence to guide the things you count.

In the near term I just want to make it easier to sprinkle AI that actually works into any application. Not just autonomous agents. With AgentM you’ll be able to use as little or as much of the library that you want. If you just need to summarize some text you’ll need to import 2 functions. An openai() function to setup the model and a summarizeItem() function to perform the summarization. You shouldn’t have to learn a giant framework to use AI in your app.

2 Likes

A marketplace of agents with a common agent OS would be a nice thing indeed.

Langchain seems to have been aiming to be that for a while (minus the marketplace), but langchain is well, langchain.

If you can implement a working browser (sub)agent, I think that’s something a lot of people might need.

Can you elaborate? By sub agent do you mean just the ability to compose micro agents into larger agents? Composability is my jam. I designed the dialog system for the Microsoft Bot Framework.

I feel like we’re a bit past due for an SDK reset. The space has been evolving so rapidly that all of the SDKs designed a year ago (mine included) are starting to feel dated. I’m designing AgentM from the ground up to leverage features like structured outputs. A lot of the guards and work arounds that people designed into their SDKs aren’t even needed anymore.

1 Like

I meant micro-agent - to use your terminology - sorry.

My understanding is that each of these micro agents is supposed to be engineered to solve a particular sub-task, and that these can then be composed into a bigger program.

While you’re focusing on the functional array operations, these are, after all, just utility functions like anything else, just with a specific signature.

One of those utility functions that people ask for a lot seems to be a browsing capability. Requests/axios, as a micro agent. That’s what I meant.

But upon closer inspection, i don’t know if that’s in scope here :thinking:

That’s correct. If you have an app and need to do summarization you can use summarizeItem directly but if your building an agent that’s reading a bunch of research papers it might run summarizeItem on each paper to help organize its thoughts.

Yeah that’s definitely a micro agent that could be created. I was thinking that’s probably a little too specialized to put directly into the core library because there are numerous ways you could approach chunking the page and different search engines you could use (Bing, Brave, etc) but it’s definitely a micro agent I intend to build.

I already have all the code to do this using Bing Search and I have my own in memory vector database I created last year called Vectra.

2 Likes

Can’t wait, then I can finally link to something the next time people ask if the API can do web browsing :laughing:

2 Likes

Actually :slight_smile:


something unrelated to deagent.net:

@stevenic you should have a talk with @reconsumeralization!

He is working exactly on that for the last couple of days and it is even in *typescript :wink:

1 Like

I feel like we already have marketplaces for agent components. It’s PyPI, NPM, NUGET, Maven, etc.

One of the things I don’t care for about LangChain is that it tries to build in everything including the kitchen sink. LlamaIndex is a little better but all of these frameworks are too opinionated in my opinion… It’s actually ok to be opinionated but you need to do it in layers so developers can chose their entry point. You need to think long and hard about the concepts you’re introducing at each layer of your SDK.

Ideally you want to lean into the concepts developers of your target language are familiar with. That’s why I don’t really try to create Python SDKs. It’s not my wheelhouse

2 Likes

I am currently working on this here:

Not exactly like that - it is just a reduced schema to make it more understandable.

My goal is to let another workflow (triggered by that RabbitMQ Message created in this workflow here) create a workflow using agents when confronted with a new document type.

The agents got to know what kind of tools and applications you have and they also need to be trained on a baselib that I created to connect all the systems…

An api to use preconfigured specialized agents would be really nice here…

I am calling this first message receiver the Amygdala Service which could look up a workflow in redis and then run that on known file types.
While the unknown document types / or let’s say the ones without a saved workflow - service is the frontal lobe service…

A document type can also be a txt file containing a stream…

And i mean when it can’t solve the puzzle it should ideally call me so i can login to a custom gpt to create the workflow there.

A baselib enables me to build very tiny services most of them are less then 40 lines of code - getting credentials and identity from hashicorps vault, knowing the workflow message format and it’s options, got access to filestorage - all with just one line of code…

I mean I am using n8n for visualisation but build an own workflow machine to get rid of fairuse licence.

1 Like

My general belief is that you don’t want LLMs driving the orchestration of your application. Whatever task you’re trying to perform I’m assuming you’d like it to work reliably and the best way to achieve 100% reliability is to use the determinism of code. LLMs aren’t generally reliable enough to drive control flow at this point.

Most of the current agent frameworks have the LLM at the center of the agent’s orchestration (I know LlamaIndex and others give you options) but I’m trying to completely avoid the temptation to design your application/agent that way as it’s not the best way to leverage LLMs. At least not currently.

My goal is to enable the building of intelligent applications (not agents) that can easily leverage LLMs for the task they’re good at. A lot of these tasks need to run in an agentic loop where they’re building a chain-of-thought across multiple turns but in my mind these are very controlled specialized agents that do one thing and have been designed to do that one thing really well.

1 Like

Not when you use nodejs that for sure.

But with the baselib and really small context in python it works really well.
Also a connection to the monitoring and he ability to deploy and test it on a stage … well… sure before a new workflow goes into production it should be reviewed by a human.

I also made a multi agent system that checks the code based on criteria from static code analysis and gives a score though. I don’t know if many humans can keep up with that.

But of course there are difficult things to solve. E.g understanding and working with genai in CAD files is really hard.

And also the OCR doesn’t work too well yet. Some resume designer add background images to a CV or use symbols like this

Screenshot from 2024-08-17 00-53-11

I don’t know if AI will ever be able to keep up with human creativity like this haha.

1 Like

We could definitely debate this for hours on end :slight_smile: my gut says that in the long run AGI will use some form of program synthesis to write and rewrite its logic on the fly to adapt to the current task. You kind of get the best of both worlds in that case. You get the determinism of code but you’ve got an AI watching it run step by step in a debugger and it can fix any issues it sees just before they happen.

When I was at Microsoft we bought a company called Semantic Machines and this is how their system works. They just didn’t have an LLM so they had to have humans review and monitor the execution.

1 Like

Oh, no I think we are on the same way then. The frontal lobe service creates code and json for workflows. Once they are created they can be tested manually and then they are just code…

The LLM Agents just come up with a possible solution.
An yeah for rewriting I think it needs some kind of dream algorithm where it just sits back and rethinks on an abstract level.

Which absolutely needs specialized agents!

For that you should absolutely talk to @reconsumeralization (he has another idea for that that follows yours) - and you should just watch him code - damn he is fast.

Another perspective… I was chatting with a guy named Steve Lucco (one of the creators of TypeScript) and the general consensus is that these LLMs probably won’t be writing any of the current programming languages for much longer. Those languages were meant for humans to write, not machines. There are much simpler ways for LLMs to write programs that might be difficult for a human to interpret but are much easier for an LLM to write, debug, and test. Steves working on something and I have done my own experiments in that area but Im currently more focused on getting LLMs to properly model reasoning from first principles. There’s just not enough hours in the day.

AgentM is an output from that reasoning work. Micro Agents can count using first principles and if they can count they can likely do other reasoning tasks currently thought impossible.

2 Likes

I think you mentioned that when we had our whiskey meet session. Should definitely repeat that some time.

btw ChatGPT says:

When considering the easiest programming language for a Large Language Model (LLM) like GPT to write, we need to account for several factors related to tokenization and the nature of the language itself. Here are some key considerations:

  1. Simplicity of Syntax: Languages with straightforward and minimalistic syntax are easier for LLMs to generate without errors. Languages like Python are known for their readability and simple syntax.
  2. Tokenization and Vocabulary Size: LLMs tokenize text into smaller units called tokens. Languages with less verbose syntax and fewer keywords are generally easier for LLMs because fewer tokens are needed to represent a program. Languages that have more uniformity in how they handle commands (fewer unique tokens) are also easier.
  3. Standard Library and Built-in Functions: Languages with extensive standard libraries and built-in functions make it easier to accomplish tasks without needing verbose or complex code. Python is again a good example because it allows for concise coding through its standard libraries.
  4. Popularity and Training Data Availability: The more training data an LLM has for a language, the better it can generate accurate code. Languages like Python, JavaScript, and HTML are more commonly used and well-represented in training data.

And in my experience LLMs tend to write pretty long javascript code… which obviously leads to problems.

Give it a limit, tell it to solve it a s oneliner if possible stuff like that helped me alot. And like I said nodejs/Typescript/ any kind of frontend stuff except for small jquery ui widgets with a stringent structure haha - is not ideal for autocoders.

Golang is another example. Very easy to understand for humans but LLMs really have problems like static typing and explicit error handling which causes more rules to follow compared to “hey bot here is a baselib with a couple of functions you can call and they are named in a way that you can’t mess it up”

So, well… to be honest I switched from my idea of building PHP - symfony bundles with the autocoder - to python because you mentioned that some programming languages are less suitable for LLM than other.

We definitely should :slight_smile:

Here’s an example of AgentM counting even and odd bits in a binary sequence. This task has been quoted time and time again as being impossible for an LLM to perform:

import { openai, reduceList } from "agentm";
import * as dotenv from "dotenv";

// Load environment variables from .env file
dotenv.config();

// Initialize OpenAI
const apiKey = process.env.apiKey!;
const model = 'gpt-4o-2024-08-06';
const completePrompt = openai({ apiKey, model });

// Create cancellation token
const shouldContinue = () => true;

// Define a binary number
const list = [1,0,1,1,1,1,0,0,0,1,0,1,0,1,1,1,1,1];

// Count number of bits that are set to 1
async function countBits() {
    // First count even bits
    let goal = `Count the number of even bits that are set to 1. The index represents the bit position.`;
    let initialValue = { count: 0 };
    let results = await reduceList({goal, list, initialValue, completePrompt, shouldContinue });
    console.log(`Even Bits: ${results.value!.count}`);

    // Now count odd bits
    goal = `Count the number of odd bits that are set to 1. The index represents the bit position.`;
    initialValue = { count: 0 };
    results = await reduceList({goal, list, initialValue, completePrompt, shouldContinue });
    console.log(`Odd Bits: ${results.value!.count}`);
}

countBits();

The output is 5 even and 7 odd which is correct :

The model only sees a portion of the list at a time so I had it to give the model a small hint that the index represents the bit position.

1 Like

Giving it a huge load of tools like this would help I guess, but I assume you are embedding the agents functionality in the vectordb that you wrote, right?

Does that still work when you have let’s say a thousand agents?

Hmm or maybe use a graphdb to figure out what to use… but still there is the old problem with RAG don’t really work as they should…

btw: