AgentM: A library of "Micro Agents" that make it easy to add reliable intelligence to any application

stevenic · September 5, 2024, 5:41pm

Very true… but you still need to acquire the structured columns to sort on and I’m simply proposing you’d use mapList for that. If you have a bunch of research papers your processing or even just a stack of customer support tickets you can use mapList to do your extraction

jochenschultz · September 5, 2024, 5:42pm

The problem is the data model. But that’s solvable.

jochenschultz · September 5, 2024, 5:46pm

I’d say the model can give a datamodel / structure / entity.
Which can lead to a nice json which leads to a schema for evaluation. Maybe the example was just not wide enough.

When you have data of bacteria with hundrets of thousands of attributes this might be difficult.

nicholishen · September 5, 2024, 5:54pm

I see. You made a good point about the size of the data which I had not considered due to my tunnel vision from the example. I can imagine a big data use case where precision trumps cost. Anyway, thanks again for sharing!

icdev2dev · September 5, 2024, 5:54pm

So abstracting this a little bit, essentially

the goal of “sort this dataset by release year” devolves into the following subgoals:

if dataset is not annotated: # does not have types of different columns"
       use llm to annotate (and store the annotation for reuse) 
if there is no sort function associated with the specific column: 
       use llm to generate  sort function (and the store the function for potential reuse)

Use traditional means to sort.

So now we can plug in different data catalogs into the mix of things; so that as new data sets are discovered, they cen also be meaningfully governed.

jochenschultz · September 5, 2024, 6:01pm

By the end of the day your response sure had some impact and will be concidered.
Maybe a parameter “cheapmode=true” would be good.

Diet · September 5, 2024, 6:59pm

I’ve been wondering about how you’d want to implement something like zip, but you’d quickly get into the weeds

do you want it fast? (i.e. parallel, with instruction duplication)
do you want it token efficient? (i.e. sequential, one shot)
do you want to use completion models only (i.e. no embedding)
how robust do you want it to be?
- against adverse input (i.e. self prompting for anchor/reference optimization)
- ~~against adversatial input~~

Can there actually be a “good” first principles implementation, or do you have to benchmark and red-team everything?

Perhaps a cheapmode/velocity/cost/speed might make sense for the standard options parameter in the template

there’s the parallel executions parameter but I don’t know if that’s all that useful on a per-agent basis - I think that you’d want a scheduler handle request buffering, packaging, and dispatch.

Edit:

all that said, I’m not trying to dissuade anyone from just slapping something down to get it working from a functional perspective. I think that’s also super valuable.

PaulBellow · September 5, 2024, 7:01pm

I sometimes joke with clients that you can choose 2 of the 3…

Get it fast
Get it Cheap
Get it done well

Diet · September 5, 2024, 7:01pm

You’re gonna have to settle for one in this case I’m thinking

stevenic · September 5, 2024, 7:06pm

If we just look at sort algorithms for a moment, they each have their tradeoffs; some do really well for large sets, some do well for smaller sets, and others do will if the set is mostly sorted.

Most sort algorithms that we actually use, like Tim sort, are actually a hybrid of other algorithms and they’re designed to make tradeoffs based on the set size. They’ll use one algorithm for small sets and a different algorithm for large sets.

I think the same principals apply here. I’m initially focused on building out a raw set of primitives that require you select which primitive or strategy you want to use based on your specific scenario. I think over time we’ll see higher level abstractions layered on top of those primitives that make those choices for you.

Diet · September 5, 2024, 7:12pm

The Java strategy, I see

jochenschultz · September 5, 2024, 7:12pm

Maybe a handful are enough to create abstract layers by themself…

Maybe better spent more time on research for an evaluation with a killswitch…

stevenic · September 4, 2024, 11:30am

I’m getting enough of the pieces of AgentM in place that I’m able to get it to do useful things. I wrote a small program (ok AgentM wrote part of it) that fetches the last days worth of research papers from arxiv.org, filters them to the papers related to topics I care about, and then projects those filtered papers to a uniform markdown format for easy scanning:

It uses gpt-4o-mini so it’s cost effective to run and it took 6 or 7 minutes in total to process 553 papers. Here’s the meat of the code:

// Initialize OpenAI 
const apiKey = process.env.OPENAI_API_KEY!;
const model = 'gpt-4o-mini';
const completePrompt = openai({ apiKey, model });


// Define the projections template
const template = 
`# <title> (<pubDate in mm/dd/yyyy format>)
<abstract>
[Read more](<link>)`;

async function main() {
    // Fetch latest papers
    console.log(`\x1b[35;1mFetching latest papers from arxiv.org...\x1b[0m`);
    const data = await fetchUrl(`https://rss.arxiv.org/rss/cs.cl+cs.ai`);
    const feed = parseFeed(data);

    // Identify topics of interest
    const topics = process.argv[2] ?? `new prompting techniques or advances with agents`;
    console.log(`\x1b[35;1mFiltering papers by topics: ${topics}...\x1b[0m`);
    
    // Filter papers by topic
    const parallelCompletions = 3;
    const filterGoal = `Filter the list to only include papers related to ${topics}.`;
    const filtered = await filterList({goal: filterGoal, list: feed, parallelCompletions, completePrompt });
    if (!filtered.completed) {
        console.error(filtered.error);
        return;
    }

    // Generate projections
    console.log(`\x1b[35;1mGenerating projections for ${filtered.value!.length} of ${feed.length} papers...\x1b[0m`);
    const goal = `Map the news item to the template.`;
    const projections = await projectList({goal, list: filtered.value!, template, parallelCompletions, completePrompt })
    if (!projections.completed) {
        console.error(projections.error);
        return;
    }

    // Render papers
    projections.value!.forEach((entry) => console.log(`\x1b[32m${entry.projection}\x1b[0m\n\n${'='.repeat(80)}\n`));    
}

stevenic · September 4, 2024, 11:47am

I did another pass other the 81 papers it selected as being on topic and had the model select the top 10 papers for the day using another projection:

Automating Knowledge Discovery from Scientific Literature via LLMs: A Dual-Agent Approach with Progressive Ontology Prompting

Read more
This paper introduces a novel framework combining large language models (LLMs) with a dual-agent system to enhance knowledge extraction from scientific literature, achieving significant improvements in annotation accuracy.

why
The integration of LLMs with a dual-agent system for knowledge extraction is a significant advancement, potentially transforming how scientific literature is analyzed and utilized.

Urban Mobility Assessment Using LLMs

Read more
This work proposes an AI-based approach for synthesizing travel surveys using LLMs, addressing privacy concerns and demonstrating effectiveness across various U.S. metropolitan areas.

why
The application of LLMs in urban mobility assessment offers a novel solution to privacy issues in travel surveys, with implications for urban planning and policy-making.

Towards Human-Level Understanding of Complex Process Engineering Schematics: A Pedagogical, Introspective Multi-Agent Framework for Open-Domain Question Answering

Read more
The paper presents a multi-agent framework for interpreting process diagrams, enhancing data privacy and explainability while achieving superior performance in open-domain question answering tasks.

why
This research enhances the understanding of complex engineering schematics, which is crucial for industries relying on process engineering, improving both privacy and explainability.

Classification of Safety Events at Nuclear Sites using Large Language Models

Read more
This research develops an LLM-based classifier to categorize safety records at nuclear power stations, aiming to improve the efficiency and accuracy of safety classification processes.

why
Improving safety classification at nuclear sites is critical for operational safety and regulatory compliance, making this application of LLMs highly impactful.

Non-instructional Fine-tuning: Enabling Instruction-Following Capabilities in Pre-trained Language Models without Instruction-Following Data

Read more
The paper explores a novel approach to fine-tuning LLMs for instruction-following capabilities using non-instructional data, potentially broadening the scope of LLM applications.

why
This approach could significantly expand the versatility of LLMs, allowing them to perform tasks without explicit instruction-following data, which is a major step forward in AI development.

Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and Analysis

Read more
This study benchmarks vision-language models’ zero-shot visual reasoning capabilities, revealing insights into their performance and limitations in complex reasoning tasks.

why
Understanding the zero-shot capabilities of vision-language models is crucial for their application in areas requiring complex visual reasoning, such as autonomous vehicles and robotics.

Toward Large Language Models as a Therapeutic Tool: Comparing Prompting Techniques to Improve GPT-Delivered Problem-Solving Therapy

Read more
The research assesses the impact of prompt engineering on LLMs delivering psychotherapy, highlighting the potential of AI in addressing mental health needs.

why
The potential use of LLMs in psychotherapy could revolutionize mental health care, making therapy more accessible and personalized.

HoneyComb: A Flexible LLM-Based Agent System for Materials Science

Read more
This paper introduces HoneyComb, an LLM-based agent system tailored for materials science, significantly improving task performance and accuracy.

why
The application of LLMs in materials science could accelerate research and development in this field, leading to faster innovation and discovery.

Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback

Read more
The study proposes a novel reward modeling method that enhances reinforcement learning from human feedback (RLHF) by utilizing language feedback, improving alignment with human preferences.

why
Enhancing RLHF with language feedback could improve the alignment of AI systems with human values and preferences, which is essential for ethical AI development.

The creative psychometric item generator: a framework for item generation and validation using large language models

Read more
This research develops a framework for generating valid creativity assessments using LLMs, demonstrating their potential in automating creativity testing processes.

why
Automating creativity assessments with LLMs could transform educational and psychological testing, making it more efficient and accessible.

icdev2dev · September 4, 2024, 2:22pm

This is pretty awesome! I am tempted to write the underlying library in python.

stevenic · September 4, 2024, 5:27pm

I setup a placeholder GitHub project and would happily add you as a contributor.

I was just trying to get to the points not to where I could write an AgentM program to go through all of the files and do the initial port. I’m just about there

icdev2dev · September 4, 2024, 5:44pm

That would be awesome!

My git user is icdev2dev. Thanks !

stevenic · September 4, 2024, 7:40pm

I’m working on the program to convert AgentM to python and it turns out you can use a micro agent and structured outputs to parse all of your cli parameters Nice!

You don’t need to predefine any switches and you can tell the agent what to assign the defaults to if they’re not specified.

stevenic · September 4, 2024, 8:41pm

Parsing CLI parameters the AgentM way

// Define a schema for parsing command line arguments
interface Args {
    sourceFolder: string;
    outputFolder: string;
    error: string;
}

const argsSchema: ArgumentSchema = {
    sourceFolder: {
        type: 'string',
        description: 'source folder containing the typescript files to convert.',
        required: true
    },
    outputFolder: {
        type: 'string',
        description: 'output folder for where to save the converted python files. should default to [sourceFolder]-py',
        required: true
    },
    error: {
        type: 'string',
        description: 'error message to display when arguments are missing.',
        required: false
    }
};

// Main function to start the conversion process
async function main() {
    // Parse cli arguments
    const parserGoal = `Parse the parameters needed to convert a TypeScript project to Python.`;
    const args = await argumentParser<Args>({ goal: parserGoal, schema: argsSchema, argv: process.argv.slice(2), completePrompt });
    if (!args.completed || args.value?.error) {
        console.error(args.error ?? args.value?.error);
        return;
    }

    console.log(args.value);
}

main();

You can just tell the LLM how it should handle missing parameters

icdev2dev · September 4, 2024, 9:00pm

This is what I was thinking about:


agentm = AgentM().provider('OpenAi').model("gpt-4o-mini").apiKey(os.environ.get("API_KEY"))

with agentm as m: 
    m = m.data(data)\
        .goal("Filter the list to only include papers related to multiple agents")\
        .with_goal_args(template).goal("Map the news items to the template")
    
    result = m.result()

Topic		Replies	Views
What Are You Building? (May 2025 Projects Hackathon Thread) Community projects	103	1583	May 8, 2025
Micro Agents: creating little agents that do one thing really really well Prompting agents	41	2509	December 9, 2024
Alpha Wave Agents: better autonomous task completion Community project , api	35	4274	December 21, 2023
Plugin injection attack, pseudo code prompts, chain of thought, plugin orchestration, and more Plugins / Actions builders gpt-4 , plugin-development	26	6832	April 14, 2024
The “system” role - How it influences the chat behavior API	45	96230	December 12, 2023