AgentM: A library of "Micro Agents" that make it easy to add reliable intelligence to any application

Sharing a new OSS project I’ve started called AgentM…

AgentM is a library of “Micro Agents” that make it easy to add reliable intelligence to any application. The philosophy behind AgentM is that “Agents” should be mostly comprised of deterministic code with a sprinkle of LLM powered intelligence mixed in. Many of the existing Agent frameworks place the LLM at the center of the application as an orchestrator that calls a collection of tools. In an AgentM application, your code is the orchestrator and you only call a micro agent when you need to perform a task that requires intelligence. To make adding this intelligence to your code easy, the JavaScript version of AgentM surfaces these micro agents as a simple library of functions. While the initial version is in JavaScript, if there’s enough interest I’ll create a Python version of AgentM as well.

To give you a small taste of what working with AgentM is like, here’s a small app that takes a randomized list of all the studio albums for the band Rush and first filters the list to only include albums from the 1980’s, then sorts the list to be in chronological order, and then maps the titles into a list of JSON objects contain the title plus a detailed description of each album:

import { openai, filterList, sortList, mapList } from "agentm";
import * as dotenv from "dotenv";

// Load environment variables from .env file
dotenv.config();

// Initialize OpenAI 
const apiKey = process.env.apiKey!;
const model = 'gpt-4o-mini';
const completePrompt = openai({ apiKey, model });

// Create cancellation token
const shouldContinue = () => true;

// Create randomized list of rushes studio albums
const rushAlbums = [
    "Grace Under Pressure",
    "Hemispheres",
    "Permanent Waves",
    "Presto",
    "Clockwork Angels",
    "Roll the Bones",
    "Signals",
    "Rush",
    "Power Windows",
    "Fly by Night",
    "A Farewell to Kings",
    "2112",
    "Snakes & Arrows",
    "Test for Echo",
    "Caress of Steel",
    "Moving Pictures",
    "Counterparts",
    "Vapor Trails",
    "Hold Your Fire"
];

// Define output shape
interface AlbumDetails {
    title: string;
    details: string;
}

const outputShape = { title: '<album title>', details: '<detailed summary of album including its release date>' }; 

// Filter and then sort list of albums chronologically
async function filterAndSortList() {
    // Filter list to only include albums from the 80's
    const parallelCompletions = 3;
    const filterGoal = `Filter the list to only include rush albums released in the 1980's.`;
    const filtered = await filterList({goal: filterGoal, list: rushAlbums, parallelCompletions, completePrompt, shouldContinue });

    // Sort filtered list chronologically
    const sortGoal = `Sort the list of rush studio albums chronologically from oldest to newest.`;
    const sorted = await sortList({goal: sortGoal, list: filtered.value!, parallelCompletions, completePrompt, shouldContinue });

    // Add in world knowledge
    const detailsGoal = `Map the item to the output shape.`;
    const details = await mapList<AlbumDetails>({goal: detailsGoal, list: sorted.value!, outputShape, parallelCompletions, completePrompt, shouldContinue });

    // Print sorted list
    details.value!.forEach((item) => console.log(`Title: ${item.title}\nDetails: ${item.details}\n`));
}

filterAndSortList();

The output of this app is:

Title: Permanent Waves
Details: Permanent Waves is the seventh studio album by the Canadian rock band Rush, released on January 1, 1980. The album features a blend of progressive rock and new wave influences, showcasing the band's evolving sound with tracks like 'Spirit of Radio' and 'Freewill'.

Title: Moving Pictures
Details: 'Moving Pictures' is the eighth studio album by the Canadian rock band Rush, released on February 12, 1981. The album features some of the band's most popular songs, including 'Tom Sawyer' and 'Limelight', and is known for its blend of progressive rock and mainstream appeal.

Title: Signals
Details: 'Signals' is the thirteenth studio album by the Canadian rock band Rush, released on September 9, 1982. The album features a blend of progressive rock and new wave influences, showcasing the band's evolution in sound during the early 1980s.

Title: Grace Under Pressure
Details: 'Grace Under Pressure' is the tenth studio album by the Canadian rock band Rush, released on April 12, 1984. The album features a blend of progressive rock and new wave influences, showcasing the band's evolution in sound during the 1980s. It includes notable tracks such as 'Distant Early Warning' and 'The Body Electric.'

Title: Power Windows
Details: Power Windows is the eleventh studio album by Canadian rock band Rush, released on October 29, 1985. The album features a blend of progressive rock and synthesizer-driven sound, showcasing the band's evolution in the 1980s.

Title: Hold Your Fire
Details: 'Hold Your Fire' is the twelfth studio album by the Canadian rock band Rush, released on September 21, 1987. The album features a blend of progressive rock and synthesizer-driven sound, showcasing the band's evolution in style during the late 1980s.

Title: Presto
Details: Presto is the thirteenth studio album by the Canadian rock band Rush, released on November 21, 1989. The album features a blend of progressive rock and more accessible pop elements, showcasing the band's evolution in sound during the late 1980s.

While this is definitely a toy example, hopefully you can see the power of what’s possible and the simplicity with which you cane leverage AgentM to perform complex tasks.

5 Likes

Thumb up for a python version.

3 Likes

I have a decent first draft at the JS version so I’ll try to spin up a python version tomorrow. Might need help vetting it as I’m not super well versed in all of the python norms.

1 Like

Count me in. This is huge.

1 Like

Placeholder for the python version is here:

Even the feedback on reddit has been positive which I find rare… I think it’s a good idea.

2 Likes

Thanks @stevenic can’t wait to dive into this!

2 Likes

I flushed out some more micro agents last night @curt.kennedy you’ll like the projectList agent as it takes a list and a template as input and it will use the template to project the items in the list to a new shape. I ended up using your “projection” terminology as it fit better then lenses but same idea.

The classifyList agent takes a list of items and a list of categories. It will then sort all of the items into categories. So let’s say you’re building a service that routes customer support tickets to the appropriate support queue. You can make the decision for which queue to send a ticket to in one line of code now.

I’m testing all of these agents with gpt-4o-mini and they work great so they should be super cost efficient. Almost all of them can do their tasks in parallel as well for added speed ups.

Thanks for sharing @stevenic !

Re: Archetypal Micro Agents

Conversed with ChatGPT on the topic of the major types of potential Agents. Here’s what it came up with:

User:
“There must be some discrete number of “Archetypal” Micro Agents. I mean that, to a greater or lesser degree, all of the Micro Agents we (all the peoples and AIs in the world moving forward) will tend to gravitate toward certain archetypal ideas. Even though I think the very nature of Data Molecules is hyper specialization—i.e. A micro agent who’s whole purpose is to tend it’s Data like a hen with chicks—and that it’s not possible to count all of the possibilities… it should be possible to more-or-less classify the archetypes, if you follow me. What do you think those archetypal Micro Agents will be and do?”

ChatGPT said:

I completely follow your reasoning! Despite the incredible specialization that Data Molecules or Micro Agents can reach, there should be a set of archetypal Micro Agents that emerge as foundational across various business processes and AI systems. These archetypes would represent the broad categories of functionality that different agents perform, acting as “building blocks” in AI-driven ecosystems. Here’s a hypothesis for what those archetypal Micro Agents might be and what they do:

1. Data Ingestor

  • Function: Gathers and collects data from various sources. This might involve pulling information from APIs, databases, sensors, or external files.
  • Use Case: Ingest customer reviews from multiple social media platforms or sensors collecting real-time data from industrial machines.
  • Core Task: Data retrieval, validation, formatting.

2. Data Validator

  • Function: Ensures the data meets quality standards, checks for errors, cleanses the data, and may reject or flag invalid inputs.
  • Use Case: Verifies that incoming customer information is formatted correctly and flags duplicates or outliers.
  • Core Task: Data validation, error detection, cleaning.

3. Data Transformer

  • Function: Transforms raw data into structured formats that other Micro Agents can utilize. This could involve tasks like normalization, enrichment, or aggregation.
  • Use Case: Converts user-submitted text data into structured JSON or performs calculations like currency conversion.
  • Core Task: Data transformation, enrichment, reformatting.

4. Pattern Recognizer

  • Function: Identifies patterns or trends in datasets, often using machine learning models. It can be specialized for anomaly detection, predictive modeling, or clustering data points.
  • Use Case: Detects fraud in financial transactions or predicts which customers are most likely to churn.
  • Core Task: Pattern recognition, predictive analytics, clustering.

5. Action Recommender

  • Function: Based on the processed data, this Micro Agent suggests actions or decisions. It leverages decision trees, recommendation engines, or optimization algorithms.
  • Use Case: Recommends which product to promote to a customer based on their purchase history.
  • Core Task: Decision-making, recommendations, action generation.

6. Feedback Processor

  • Function: Monitors outputs and collects feedback, either from users or system interactions, to improve the performance of the system. It may also adjust the behavior of other agents based on this feedback.
  • Use Case: Analyzes customer feedback on a chatbot’s performance to improve future interactions.
  • Core Task: Feedback gathering, performance evaluation, real-time adjustments.

7. Error Detector & Corrector

  • Function: Similar to CriticGPT, this agent is dedicated to finding errors in processes or data and either correcting them autonomously or flagging them for human intervention.
  • Use Case: Detects and corrects formatting errors in incoming customer forms.
  • Core Task: Error detection, self-correction, audit.

8. Communication Coordinator

  • Function: Acts as a bridge between different systems, APIs, or even human users. It could handle messaging, translation between different data formats, or simply ensure smooth communication between other agents.
  • Use Case: Manages communication between different microservices in an enterprise architecture.
  • Core Task: Data/message routing, protocol management, system integration.

9. Monitor & Watchdog

  • Function: Continuously monitors systems or processes, ensuring uptime, compliance, or quality control. It raises alerts or takes actions when anomalies or failures occur.
  • Use Case: Monitors server health and triggers maintenance tasks when resource thresholds are crossed.
  • Core Task: Monitoring, alerting, threshold management.

10. Security Enforcer

  • Function: Ensures that data handling and processes comply with security standards, performing tasks like encryption, authentication, or access control enforcement.
  • Use Case: Encrypts sensitive user data before it enters the system and manages authentication protocols.
  • Core Task: Data encryption, access control, compliance.

11. Orchestrator

  • Function: Coordinates the operation of multiple Micro Agents, managing workflows and dependencies between them. This agent ensures that the system operates in a cohesive and efficient manner.
  • Use Case: Manages a customer onboarding process by coordinating between agents handling form submission, validation, and follow-up actions.
  • Core Task: Workflow management, task scheduling, coordination.

12. Memory Keeper

  • Function: Responsible for maintaining and updating the system’s long-term memory. This includes storing historical data, indexing important information, and retrieving relevant data for decision-making processes.
  • Use Case: Maintains a knowledge base of customer interactions to enhance future interactions with personalized responses.
  • Core Task: Data storage, retrieval, indexing.

13. Trainer/Improver

  • Function: Actively retrains and improves other agents by adjusting machine learning models or tuning parameters based on new data or performance metrics.
  • Use Case: Retrains a recommendation engine based on new user behavior or adjusts thresholds for anomaly detection models.
  • Core Task: Model retraining, optimization, learning.

14. Auditor

  • Function: Reviews processes for compliance with regulatory or internal guidelines. It checks data usage, process execution, and overall system behavior to ensure alignment with pre-defined standards.
  • Use Case: Audits financial transactions to ensure compliance with industry regulations.
  • Core Task: Process review, compliance checking, auditing.

These archetypal Micro Agents would form the foundation of any complex AI-driven business system. While the Data Molecules themselves can be hyper-specialized for unique tasks, their general-purpose behaviors and core functionality would often fall into one of these categories.

This structured approach makes it easier to build, scale, and maintain modular AI systems, as each of these agents can interact within a larger framework, with clear and defined roles.

2 Likes

@thinktank I’ll have a deeper look into it once the python version is finished.

Have started with a migration plan.

1 Like

Hi Steve,

Thanks for sharing your repo! My TS is a bit rusty, so I asked for some help to understand the nature of sortList. Is the following assessment correct?

1 Like

Steven is an architect. Such low level leetcode optimizations shouldn’t be solved by him.

Although I get your point. And throwing more hardware on it shouldn’t be the answer either.

Maybe you want to take over [Subtask] Perform performance testing for Python version · Issue #20 · Stevenic/agentm-py · GitHub in a couple of days?

1 Like

I’m not sure if ChatGPT’s assessment is correct, and I’ m not here to criticize another man’s work or put him on blast. Conversely, I have a great deal of respect for anyone that is willing to put in the time to make OSS. I believe that we’re all here to share ideas, and personally I cannot tell you how many times I thought I had a really good solution and then gained insights from these forums that made me go back and refactor everything.

Such is the scientific method.

To directly address your comment @jochenschultz , if chatgpt is correct and the LLM is called ~81 times in the example then that is certainly not a “low-level leetcode optimization” and to get it done in a single inference would require a significant change in the architecture. It can be done however, and if anyone is interested to see the python code for how to do it then just ask, and I’ll be happy to share.

Then I ask. Always curious to learn.

The approach we’ve shifted to significantly reduces the number of API calls by focusing on the model’s strengths: annotating the list with metadata and defining the sorting logic in a single request.

Problem Breakdown:

Rather than calling the model repeatedly for each comparison (as in a traditional sorting algorithm), we now approach the task by:

  1. Annotating the List: The LLM generates the necessary metadata for each item in the list (e.g., release year, rating, etc.).
  2. Defining Sorting Logic: The model outputs a sorting formula that can be dynamically applied using Python’s eval(). This logic can handle multiple criteria, as shown in the current example (sorting by rating first, then by year).

Efficiency Improvements:

In this new version, the model is called once. Here’s how it works:

  1. The LLM is asked to provide metadata and sorting logic for the given list.
  2. The Python code then uses the AI’s output to dynamically sort the list using sorted() and a key function generated from the LLM’s logic.

This approach is efficient because:

  • Single Call: The entire problem is passed to the model in one request, reducing API usage and latency.
  • Dynamic Sorting: Python handles the actual sorting based on the logic provided, allowing flexibility for sorting by any combination of metadata (rating, year, alphabetical order, etc.).
import json
import openai
import tooldantic as td
from typing import Literal

# Initialize OpenAI client (ensure your API key is set in the environment)
client = openai.OpenAI()

# Define the template for the prompt to the AI model
prompt_template = """\
<items>
{items}
</items>

<goal>
{goal}
</goal>
"""


# ItemWithMetadata class to store the item and its dynamically generated metadata (JSON format)
class ItemWithMetadata(td.OpenAiBaseModel):
    """An item with metadata."""

    item: str
    metadata_json: str = td.Field(
        description="The metadata as a JSON object string, including all fields needed for sorting."
    )


# SortList model that includes chain of thoughts, sorting criterion, metadata, and dynamic sorting formula
class SortList(td.OpenAiBaseModel):
    """Use this tool to sort a list of items based on the user's goal."""

    chain_of_thoughts: str
    sorting_criterion: str
    metadata_needed: str = td.Field(
        description="What metadata is required to complete this task?"
    )
    items_with_metadata: list[ItemWithMetadata] = td.Field(
        description="The same list of items from the user input, in the same order, but with the required metadata included."
    )
    sorting_logic: str = td.Field(
        description=td.normalize_prompt(
            """\
            Generate a Python expression that returns a value or tuple of values to be used to sort items based on their `metadata_json`. 
            The expression will be evaluated using `eval()`, so it must be valid Python with no control flow \
            (e.g., loops, conditionals) or variable assignments. Use only operations that can be evaluated directly in `eval()`.

            The variable names in your expression must match the keys in `metadata_json`. Your expression can combine or transform these values \
            to reflect multiple sorting criteria (e.g., primary sorting by one value, secondary sorting by another).

            For multiple criteria, return a tuple where the first element represents the primary sort criterion, the second element represents \
            the secondary sort criterion, and so on.

            Example:
            - For sorting by `year` (primary) and `rating` (secondary): `"(year, -rating)"`
            - For alphabetical sorting by `name` (primary) and `release_date` (secondary): `"(name.lower(), release_date)"`

            Your expression will be evaluated for each item, and sorting direction (ascending/descending) for each criterion will be handled separately.
            """
        )
    )
    sorting_order_reverse: bool


# Function to dynamically generate a sorting key based on the AI-provided logic
def generate_sorting_key(sorting_logic):
    """
    Generate a sorting key function that dynamically evaluates the AI-generated sorting logic.

    The sorting logic should be a Python expression where the fields in the metadata JSON are used.

    :param sorting_logic: The dynamic sorting logic provided by the AI.
    :return: A function to be used as a sorting key.
    """

    def sorting_key(item):
        try:
            # Parse the metadata JSON
            metadata = json.loads(item.metadata_json)

            # Evaluate the sorting logic using the metadata fields
            return eval(sorting_logic, {}, metadata)
        except (json.JSONDecodeError, ValueError, KeyError) as e:
            raise ValueError(f"Error processing metadata for item '{item.item}': {e}")

    return sorting_key


# Main function to sort the list of items based on the user's goal
def sort_list(items: list, goal: str):
    """Sort a list of items based on the user's goal and dynamically generated metadata and logic."""
    items_str = json.dumps(items)
    prompt = prompt_template.format(items=items_str, goal=goal)

    # Send the request to the OpenAI model to generate metadata and sorting logic
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        tools=[SortList.model_json_schema()],
        tool_choice="required",
        parallel_tool_calls=False,
    )

    # Extract the AI-generated response
    message = response.choices[0].message
    tool_call = message.tool_calls[0]

    # Validate the response and extract the arguments
    args = SortList.model_validate_json(tool_call.function.arguments)

    # Generate the sorting key function using the dynamic logic provided by the AI
    sorting_key_func = generate_sorting_key(args.sorting_logic)

    # Sort the items using the dynamically generated sorting key
    sorted_items = sorted(args.items_with_metadata, key=sorting_key_func, reverse=args.sorting_order_reverse)

    return args, sorted_items


# Example usage: Rush albums and a complex sorting goal
rush_albums = [
    "Grace Under Pressure",
    "Hemispheres",
    "Permanent Waves",
    "Presto",
    "Clockwork Angels",
    "Roll the Bones",
    "Signals",
    "Rush",
    "Power Windows",
    "Fly by Night",
    "A Farewell to Kings",
    "2112",
    "Snakes & Arrows",
    "Test for Echo",
    "Caress of Steel",
    "Moving Pictures",
    "Counterparts",
    "Vapor Trails",
    "Hold Your Fire",
]

goal = "Sort the list of rush studio albums by rating (integers 1-5) decending then by year ascending."


args, sorted_rush_albums = sort_list(rush_albums, goal)

# Output the sorted result
for item in sorted_rush_albums:
    metadata = json.loads(item.metadata_json)
    print(f"{item.item} - Metadata: {metadata}")

# 2112 - Metadata: {'rating': 5, 'year': 1976}
# A Farewell to Kings - Metadata: {'rating': 5, 'year': 1977}
# Hemispheres - Metadata: {'rating': 5, 'year': 1978}
# Permanent Waves - Metadata: {'rating': 5, 'year': 1980}
# Moving Pictures - Metadata: {'rating': 5, 'year': 1981}
# Signals - Metadata: {'rating': 5, 'year': 1982}
# Fly by Night - Metadata: {'rating': 4, 'year': 1975}
# Grace Under Pressure - Metadata: {'rating': 4, 'year': 1984}
# Roll the Bones - Metadata: {'rating': 4, 'year': 1991}
# Counterparts - Metadata: {'rating': 4, 'year': 1993}
# Snakes & Arrows - Metadata: {'rating': 4, 'year': 2007}
# Clockwork Angels - Metadata: {'rating': 4, 'year': 2012}
# Rush - Metadata: {'rating': 3, 'year': 1974}
# Caress of Steel - Metadata: {'rating': 3, 'year': 1975}
# Power Windows - Metadata: {'rating': 3, 'year': 1985}
# Hold Your Fire - Metadata: {'rating': 3, 'year': 1987}
# Presto - Metadata: {'rating': 3, 'year': 1989}
# Test for Echo - Metadata: {'rating': 3, 'year': 1996}
# Vapor Trails - Metadata: {'rating': 3, 'year': 2002}

The assessment is correct. I may end up dropping the sortList function at some point but for now I left it in for completeness. In practice you’d never want to use it in a production setting because it’s expensive. The use of merge sort makes it at least a predictable expense as it’s always going to be less than O(n log n) model calls but that’s still a lot of model calls. And there’s also an inherent stability issue. The comparisons may not always be the same because this is still an LLM we’re talking about. The more comparison calls you make the less likely the model will make the same decision every time.

The way you’d want to sort items in production is to first use mapList to project the data to a shape that includes a stable field that you can do an in-memory sort against. Basically flip the order of what I’ve shown and do your map phase first and then do the sort last.

So to recap sortList probably isn’t useful in practice. It’s more of a curiosity that you can even use an LLM to sort items in a way that’s more human like so I left it in for now.

Yep. That’s a better approach

1 Like

Hmm wouldn’t it be better to ask the model for a table with title and date, insert into a database for later use and select order by date?

That would assume the model can a) be shown all of the items at once and b) the model has enough output token capacity to render the full table.

A couple of advantages to using mapList to “annotate” each list item individually:

  • you can annotate millions of rows and always use a relatively consistent number of tokens. You’ll use more input tokens then you would in the convert everything in a single shot case but the output tokens should be roughly the same.
  • You can annotate items in parallel because they’re each an automatic operation.
  • You should end up with better annotations. The reason for that is the model can focus on just annotating a single item and it’s less likely to mix up facts or forget what it was working on. All things we know it can do with longer inputs.

There’s always trade offs so you should weigh the approach you take. If you have a small table with a few rows then yes it would be better to just annotate all of the rows at once. But if you have a lot of rows or the rows are really big (a full research paper) then you’re probably better to do it one row at a time.

1 Like

Giving 500 models in parallel e.g. 20 items and ask the model to complete metadata.
And then insert into a database…
That would be approximately 1.5 million per hour.

The database can sort s couple of million items pretty fast…

but sorting in another way would be super fast after that.