Micro Agents: creating little agents that do one thing really really well

There’s no VectorDB or anything it’s just a function that does something intelligent. You could have a million different Micro Agents because we’re not using an LLM to orchestrate them. Here’s the full code for the reduceList micro agent:

import { AgentArgs, AgentCompletion, JsonSchema, Message, SystemMessage, UserMessage, WithExplanation } from "../types";
import { composePrompt } from "../composePrompt";
import { CancelledError } from "../CancelledError";

export interface ReduceListArgs<TResult extends {}> extends AgentArgs {
    goal: string;
    list: Array<any>;
    initialValue: TResult;
    jsonSchema?: JsonSchema;
    temperature?: number;
    maxHistory?: number;
}

export async function reduceList<TResult extends {}>(args: ReduceListArgs<TResult>): Promise<AgentCompletion<TResult>> {
    const { goal, list, initialValue, jsonSchema, completePrompt, shouldContinue } = args;
    const temperature = args.temperature ?? 0.0;
    let maxHistory = args.maxHistory ?? 8;
    if (maxHistory < 2) {
        maxHistory = 2;
    }

    // Compose system message
    let output: WithExplanation<TResult> = {...initialValue, explanation};
    const system: SystemMessage = {
        role: 'system',
        content: composePrompt(systemPrompt, {goal, output})
    };

    // Enumerate list
    const useJSON = true;
    const history: Message[] = [];
    for (let index = 0; index < list.length; index++) {
        // Compose prompt
        const item = args.list[index];
        const prompt: UserMessage = {
            role: 'user',
            content: composePrompt(itemPrompt, {index, item})
        };

        // Complete prompt
        const result: AgentCompletion<WithExplanation<TResult>> = await completePrompt({prompt, system, history, useJSON, jsonSchema, temperature});
        if (!result.completed) {
            return { completed: false, error: result.error };
        } else if (!await shouldContinue()) {
            return { completed: false, error: new CancelledError() };
        }

        // Update output and history
        output = result.value!;
        history.push(prompt);
        history.push({ role: 'assistant', content: JSON.stringify(output) });

        // Prune history
        if (history.length > maxHistory) {
            history.splice(0, history.length - maxHistory);
        }
    }

    // Remove explanation
    if (output.explanation) {
        delete output.explanation;
    }
    return { completed: true, value: output };
}

const explanation = `<explanation supporting your answer>`;
const systemPrompt = 
`You are an expert at combining and reducing items in a list.

<GOALS>
{{goal}} 

<INSTRUCTIONS>
Given an <ITEM> return a new JSON <OUTPUT> object that combines the item with the current output to achieve the <GOAL>. 

<OUTPUT>
{{output}}`;
const itemPrompt =
`
<INDEX>
{{index}}

<ITEM>
{{item}}
`;

It has an agentic loop and it creates a chain-of-thought so it’s very much an agent. It just doesn’t look like what you’ve been told agents should look like.

2 Likes

That looks expensive haha

The plus is I’m the only one that needs to know how it works. It’s leveraging the 2,000+ hours I’ve spent talking to these model. You get a simple function that does something magical.

2 Likes

Actually, its probably not that expensive and while I’m using gpt-4o it generally works fine with gpt-4o-mini.

The examples I’m showing are toy examples but where you’d use this is in a scenario where you need to use some sort of intelligence when counting. Like counting the number of success protein folds out of millions of trials.

2 Likes

Just a wild guess:

(async () => {
  const result = await reduceList({ goal: "Sum up numbers in a list", list: [5, 10, 15], initialValue: 0 });
  console.log("Final Output:", result);  
})();

But yeah, I get the point…

“hey frontal lobe here is some data, create a workflow to handle it:…”

Problem here is that it first needs to find a document type and that can be really tricky.

Can you imagine that you can get 650+ PDF with different but similar document types (some only differ by a single number which might not even get captured by OCR) from a single export of a payroll software haha…

And the task actually is “find the following data \n … inside this: \n”…

When you say similar but different document types. Can you give me an example? Classifying documents would be a great use for this.

Hmm think of it like “credit note statement” and “invoice” - difference is who creates it… but there are more similar documents… e.g. in payroll let’s say a document that shows your bank account for march and one that show it for april… and nothing has changed. Hmm maybe a bad example since they are the same document type…

but the last one has a different meaning than the ones issued previously.

A simple way to classify documents is removing stop words and count the keywords and link them to synonyms…

The one with the highest similarity… I mean I guess you know.

Yeah that’s a simple classification task… I’m planning to create a classifyItem() micro agent that given an object or blob of text and a list of categories can classify the item… For really large blobs of text like documents I’m thinking about a compressText() micro agent that will break the text into chunks and recursively compress the chunks relative to some goal. Basically you can tell the compressText() agent what kind of details it should maintain when compressing the text.

So if your workflow is:

[document] → compressText() → classifyItem() → (branch based on type)

you can process millions of documents that fall into potentially hundreds of categories. thousands if you do high level classification first and then run through a more fine grained classifier.

1 Like

I’d say synonyms are a must - which adds another layer of stupid work “collecting the data for that”…
Altough it might to some extend learn that by itself - or by letting a bunch of poor guys sitting in front of documents and classifying that (preferably customers).

Why do you need synonyms? Why not just let the LLM classify them? It’s the best classifier I’ve ever seen.

1 Like

Yeah, use that as a labeler. I am just a cheap ass. Actually wanted to give a free example… like in “A simple way to classify documents is removing stop words and count the keywords and link them to synonyms…” translated to “a cheap solution…”

Well the great thing about this architecture is you can easily replace any micro agent (function) with some other code or system that does the same thing…

The other thing you can do is you can start with the LLM based classifier and then use it’s first 10,000 classifications as training data for a smaller cheaper model.

2 Likes

But isn’t that against the rules of OpenAI :wink:

If it is use Llama 3.1 405B as your classifier. The other thing that’s nice about the approach I’m taking is you can use as many different models as you want. I use all of the major models regularly and my prompts tend to be general purpose and work generally well with any GPT-4 class model.

The openai() model function in AgentM will work with Fireworks.ai or Groq. You just need to update the endpoint. And Llama.cpp, Ollama, etc. :slight_smile:

2 Likes

Screenshot from 2024-09-02 06-22-07

:wink:

Finally a good use for a strategy pattern haha

1 Like

I think this is a very forward thinking idea: “For tasks that require just a little bit of intelligence.”

The whole flow of agents in economies is on a spectrum from ‘Broad Generalization’ to ‘Hyper Specialization.’ (Imma econ guy.) I very much see the need for what you’re making, and you’re going to save a lot of work by making an SDK.

It seems like you have the underlying structure pretty well thought out, especially with the modularity, and the ease of swapping steps. You’ve mentioned a few Micro Agents you’ve thought of:

Is there some group of Archetypal Agents exists? Some number of Agents that broadly cover most topics?

I’d love input on that… I sort of figure that will work itself out organically but I’m starting with the clear primitives that cover the GenAI basics plus more traditional cs algorithms. I don’t think you’ll use the cs algorithms all that much but they’re there if you need them and I think they form a solid basis of measuring the capabilities of models.

1 Like

Here’s an example of using AgentM to sort rush studio albums in chronological order… This is implemented as a merge sort which has a constant complexity of O(n log n).

import { openai, sortList } from "agentm";
import * as dotenv from "dotenv";

// Load environment variables from .env file
dotenv.config();

// Initialize OpenAI 
const apiKey = process.env.apiKey!;
const model = 'gpt-4o-mini';
const completePrompt = openai({ apiKey, model });

// Create cancellation token
const shouldContinue = () => true;

// Create randomized list of rushes studio albums
const list = [
    "Grace Under Pressure",
    "Hemispheres",
    "Permanent Waves",
    "Presto",
    "Clockwork Angels",
    "Roll the Bones",
    "Signals",
    "Rush",
    "Power Windows",
    "Fly by Night",
    "A Farewell to Kings",
    "2112",
    "Snakes & Arrows",
    "Test for Echo",
    "Caress of Steel",
    "Moving Pictures",
    "Counterparts",
    "Vapor Trails",
    "Hold Your Fire"
];

// Sort list of rush studio albums chronologically
const logExplanations = false;
const parallelCompletions = 3;
const goal = `Sort the list of rush studio albums chronologically from oldest to newest.`;
sortList({goal, list, parallelCompletions, logExplanations, completePrompt, shouldContinue }).then(result => {;
    if (result.completed) {
        result.value!.forEach((item, index) => console.log(`${index + 1}. ${item}`));
    } else {
        console.error(result.error);
    }
});

This returns the correct order of:

  1. Rush
  2. Fly by Night
  3. Caress of Steel
  4. 2112
  5. A Farewell to Kings
  6. Hemispheres
  7. Permanent Waves
  8. Moving Pictures
  9. Signals
  10. Grace Under Pressure
  11. Power Windows
  12. Hold Your Fire
  13. Presto
  14. Roll the Bones
  15. Counterparts
  16. Test for Echo
  17. Vapor Trails
  18. Snakes & Arrows
  19. Clockwork Angels

This sort of 19 items took 56 requests with 7,190 input tokens and 2,703 output tokens so you’re obviously not going to use it in just any situation.

Note that it is using gpt-4o-mini so the costs are minimal and you can configure the agent to make parallel requests which helps speed up execution. You can also ask it to log its explanations to the console for debugging purposes which looks like this:

1 Like

So with agents I think it’s about time that people start getting to understand how they are essentially functions if you do it right and they should be used as functions that are combined together through functions in the process of multi collaborative communicating agents the agents and the communications themselves should all be functions that include self-reflection from previous results that diminished on a sliding scale until the agents are completely optimized for the purpose they’re feeling at which point they can generally become workflows that do not require any tokens. The real key is that you make it so they drop nodes of their actions as they go that you can use within a drag-and-drop or json editable or workflow system that allows you to convert them easily into simply work flows that you can convert into macros with the ability to use llm logic or reasoning whenever a situation arises that it is still needed to minimize your tokenization cost and eliminated if possible.

Jochen for the situation you’re describing with lists here my favorite trick is to use a vision based model that you can establish a semantic layer with so that you can literally parse semantic concepts as opposed to specific listing items. To give you an idea it’s super fun to be able to say if a joke that is funnier than a scale of 1 to 10 out of 6 is told and this category on this forum then I want you to send that joke through the joke Improvement Pipeline and then repost it in this form with these pretext and you can essentially increase the level of enjoyment that people experience in their lives on a daily basis LOL

1 Like

Joke improvement requires feedback like any selfoptimizing system.

1 Like