Alpha Wave Agents: better autonomous task completion

I just published the first draft of my new agent framework that’s built on top of Alpha Wave. This new framework enables the creation of semi-autonomous agents, capable of completing complex multi-step tasks. Unlike other agent experiments we’ve seen so far, like AutoGPT and BabyAGI, AlphaWave agents should actually complete the task they’re presented with more often then not. I can get into the technical details for why it’s superior to other agent frameworks, but let me just show you some code.

This is the entire TypeScript code for a simple agent that’s an expert at math:

import { Agent, AskCommand, FinalAnswerCommand, MathCommand } from "alphawave-agents";
import { OpenAIClient } from "alphawave";
import { config } from "dotenv";
import * as path from "path";
import * as readline from "readline";

// Read in .env file.
const ENV_FILE = path.join(__dirname, '..', '.env');
config({ path: ENV_FILE });

// Create an OpenAI or AzureOpenAI client
const client = new OpenAIClient({
    apiKey: process.env.OpenAIKey!
});

// Create an agent
const agent = new Agent({
    client,
    prompt: `You are an expert in math. Use the math command to assist users with their math problems.`,
    prompt_options: {
        completion_type: 'chat',
        model: 'gpt-3.5-turbo',
        temperature: 0.2,
        max_input_tokens: 2000,
        max_tokens: 1000,
    },
    initial_thought: {
        "thoughts": {
            "thought": "I need to ask the user for the problem they'd like me to solve",
            "reasoning": "This is the first step of the task and it will allow me to get the input for the math command",
            "plan": "- ask the user for the problem\n- use the math command to compute the answer\n- use the finalAnswer command to present the answer"
        },
        "command": {
            "name": "ask",
            "input": { "question":"Hi! I'm an expert in math. What problem would you like me to solve?" }
        }
    },
    logRepairs: true,
});

// Add commands to the agent
agent.addCommand(new AskCommand());
agent.addCommand(new FinalAnswerCommand());
agent.addCommand(new MathCommand());

// Listen for new thoughts
agent.events.on('newThought', (thought) => {
    console.log(`\x1b[2m[${thought.thoughts.thought}]\x1b[0m`);
});

// Create a readline interface object with the standard input and output streams
const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout
});

// Define main chat loop
async function chat(botMessage: string|undefined) {
    // Show the bots message
    if (botMessage) {
        console.log(`\x1b[32m${botMessage}\x1b[0m`);
    }

    // Prompt the user for input
    rl.question('User: ', async (input: string) => {
        // Check if the user wants to exit the chat
        if (input.toLowerCase() === 'exit') {
            // Close the readline interface and exit the process
            rl.close();
            process.exit();
        } else {
            // Route users message to the agent
            const result = await agent.completeTask(input);
            switch (result.status) {
                case 'success':
                case 'input_needed':
                    await chat(result.message);
                    break;
                default:
                    if (result.message) {
                        console.log(`${result.status}: ${result.message}`);
                    } else {
                        console.log(`A result status of '${result.status}' was returned.`);
                    }

                    // Close the readline interface and exit the process
                    rl.close();
                    process.exit();
                    break;
            }
        }
    });
}

// Start chat session
chat(`Hi! I'm an expert in math. What problem would you like me to solve?`);

Half of the code is just managing the command line interface. The part we care about is the construction of the Agent instance. To define an Agent you give it a client, a very simple prompt, some options for which model to use, and for gpt-3.5-turbo a super important initial thought (I use gpt-4 to generate this initial thought as it’s way better at creating a plan that gpt-3.5 will follow.)

You then need to add the commands to the agent that you want it to use… You should always include a FinalAnswerCommand, otherwise the agent doesn’t know how to finish the task, and I recommend adding an AskCommand as the model will use that if it needs more input from the user to complete its task or even just when it gets confused. From there you can add any other commands you’d like. This sample uses the MathCommand which lets the model pass in small snippets of JavaScript for evaluation.

Let’s see a sample run of this code:

I have 4 basic commands implemented so far AskCommand, FinalAnswerCommand, MathCommand, and PromptCommand with more on the way.

I’m all about composition though, so Agents themselves are commands. That means you can define child Agents which complete a task that parent Agents can initiate. What’s cool is the agents actually interface with each other via natural language. They just talk to each other. I’ll try to cook up a sample that shows this off…

Speaking of samples, you can find the math agent sample here and I have several other samples coming. I have a flight booking agent, a Macbeth agent which involves 16 agents that can act out any scene from the play Macbeth, a programmer agent that can write and test simple JavaScript functions, and an Agent Dojo sample that is my spin on BabyAGI and is an agent that can train other agents to become experts on any given subject (machine learning, music theory, etc.) Look for all those to start dropping over the next week or so.

9 Likes

And for those of you out there outside typescript land, should be in python in a few days, I hope

6 Likes

You rock @bruce.dambrosio :slight_smile:

1 Like

And for those outside of both TypeScript and Python land, should be in .NET and maybe even Rust within the next few weeks :slight_smile:

1 Like

I just checked in a new Macbeth Agent sample. Macbeth is an agent that’s capable of performing any scene from the play Macbeth. The top level agent plays the role of narrator and there are 15 separate PromptCommands that play the characters. The narrator sets the scene and facilitates the dialog between the characters by deciding which character should speak next. The narrator is allowed to pass in a little scene direction but it’s not allowed to feed the characters lines. The character prompts see a shared history of the dialog for the current scene, but they’re responsible for coming up with their next line of dialog on their own.

I originally created this sample for GPT-4, but with a fair amount of work I have GPT-3.5 doing a reasonable job at the task. It’s a tough ask for GPT-3.5 as there’s a lot of coordination going on and it’s only ever allowed to see part of the task. Here’s a sample scene with GPT-3.5 that went fairly well:

Green responses are the narrator speaking and each line of dialog is essentially 2 models calls. the narrator has to first decide who should speak next and then trigger a call to that characters prompt. I’ve gotten GPT-4 to play through every scene of Act 1, something like 200 - 300 model calls, and it seems to follow the story reasonably well. GPT-3.5 is likely going to generate a different story, but with familiar characters, every time.

What I like about this sample is there are some fairly reliable hallucinations the agent has to deal with, and GPT-3.5 ends up hallucinating a lot. So far, Alpha Wave has been able to walk it back from all of those hallucinations. Here’s a very common one:

The narrator has 14 separate commands for each of the main characters, but is has a single “extras” command that it has to use for all minor characters, Even GPT-4 invariably hallucinates that there should be a hidden command it can use. Simply telling the model that the command doesn’t exist and it should try something else is enough to walk all the models back from hallucinations like this. On second attempt it correctly chooses to use the “extras” command instead…

And here’s some bonus content to show just how off the rails GPT-3.5 can go:

It’s asking me if they can get a room together… I took away its “ask” command because of this but in fairness the prompt I was passing in at this point was completely broken. It was missing the core prompt guidance so I wont judge it too harshly.

2 Likes

And now, if you implement an agent, which dynamically chooses an agent based on the users goal, you created an autonomous multi-agent system.

This is, what I have done with my multi-agent system. I want to include the SOAR framework, which is much better at solving goals with sub-tasks then GPT, because it is a rule-based system, with short-term, long-term and episodic memory. reinforcement learning already included.

My initial thought was to handle everything with GPT, but this was a false assumption, because there are many other very intelligent systems out there performing tasks for what they are written for.

2 Likes

That’s the direction I’m heading in but the solution isn’t quite that simple… My goal is to create a top level Agent that can select the agent to solve a particular task from a pool of thousands or even hundreds of thousands of agents. There are some intermediate steps needed to achieve that.

First step is to get a few agents working rock solid and AlphaWave is on a path towards that.

1 Like

I solved it with a Delegate-Agent as a middleman, all agents are categorized by their tasks. For example, agents which are for local tasks, like programming, file handling, folder management are categorized. And have sub-categories for example all agents which falls into the programming are chunked in agents for writing, testing, fixing code etc.

The Delegate-Agent analyze the current task and decides which category need to be used.

Their are also agents for decomposing tasks, refine the overall plan, etc.
This makes the system very solid, because separation of concerns is a powerful principle here.

But like I mentioned above, some tasks can be delegated to the SOAR architecture, which is specialized in this kind of planning and executing. Excerpt from GPT-4:

Here are a few ideas on how you might integrate SOAR and GPT to create a more effective problem-solving architecture:

  1. Rule-Based Decision Making: GPT could be used to generate a set of potential operators or actions based on the current state. These could then be evaluated using SOAR’s rule-based system to select the most appropriate action. This could allow for more creative and diverse problem-solving strategies.

  2. Natural Language Processing: GPT’s strength lies in its ability to understand and generate human-like text. This could be used to enhance SOAR’s ability to interact with human users. For example, users could describe a problem in natural language, and GPT could translate this into a state that SOAR could understand and act upon.

  3. Learning and Adaptation: GPT could be used to generate new rules for SOAR based on past experiences. This could allow the system to learn and adapt over time, improving its problem-solving capabilities.

  4. Problem Decomposition: GPT could be used to break down complex problems into smaller, more manageable subproblems. SOAR could then solve these subproblems individually, making it easier to tackle complex tasks.

  5. Explanation and Justification: After SOAR has made a decision or taken an action, GPT could be used to generate a human-readable explanation or justification. This could make the system more transparent and understandable to users.

  6. Simulation and Prediction: GPT could be used to simulate potential future states based on the current state and a proposed action. SOAR could then use these simulations to make more informed decisions.

2 Likes

It’s review time for us at Microsoft so I’m working on a sample agent that can help you write a performance review and thought I’d share a quick screen shot. This from about 2 turns in for the “Core Priorities” section of my review. This is gpt-3.5-turbo and I’ve turned off all the feedback reporting but there’s a couple of points in the task where the model tends to hallucinate:

Here’s the entire code for that agent:

As you can see the Agent made it all the way through the task and finished with a call to completeTask.

This will be structured as a hierarchy of agents that break the task down into smaller and smaller subtasks:

Now to very that agents can call other agents :slight_smile:

1 Like

Alphawave 0.1.0 and Promptrix 0.1.0 now available on pypi!

pip3 install alphawave

should install both, or if you only want promptrix: pip3 install promptrix

Alphawave 0.1.0 only includes core alphawave, no Agents. Look for that shortly.

Also, only OpenAI client support. Azure client is there, but in raw GPT-4 translation, known errors. If you need this and can’t do it yourself let me know.

An OS LLM Client with support for fastchat server api and also simple JSON/txt socket, as well as user/asst template support using the FastChat conversation.py class, coming very soon.

3 Likes

Good work rate, I’ll install the python version and test it.

1 Like

I have it installed it, but how do I use it ?

Any READ.me?

@bruce.dambrosio do you want to maybe try to port the base AlphaWave sample from JS to Python?

@jethro.adeniran the readme on the JS side of things my help get you started. I know Bruce was going to try and work up a jupyter notebook but this is still a bit of a work in progress:

Yeah, sorry. It is rather inscrutable. That’s high on the list.
@stevenic - I’ll take a look tonight. I really want to get that OSClient out, I think alphawave is really needed there.

2 Likes

@stevenic - which of these will run without the agent code?

This sample requires no agent code… It’s basically a simple ChatGPT style example. I still owe you an example of base AlphaWave with a validator that walks the model back from hallucinations but I can tell you that it works. It’s best seen in the agent demos because they make mistakes all the time but the core isolation & feedback mechanism AlphaWave employs works.

I was demoing an agent to someone today and GPT-3.5 first failed to return a JSON thought structure the agent system needs so it gave it feedback. Normally it returns the correct structure on the next call but this time the model returned a partial JSON object. AlphaWave gave it feedback as to the fields it was missing and on the next call it returned a complete and valid object. And that was GPT-3.5 which is the worst of these models.

This stuff sounds like snake oil but I promise it works to improve the overall reliability of these models to return valid structured results.

Ah. Already have that one running, and in the tests dir on github. will post now

Well, deep apologies, I’m a reasonably experienced developer, but unexperienced in packaging and pypi.

do this to get alphawave 0.1.1:

pip3 install --ugrade alphawave

now save the below to alphachat.py, then run it:

import os
from pathlib import Path
import readline
from alphawave.AlphaWave import AlphaWave
from alphawave.OpenAIClient import OpenAIClient
import promptrix
from promptrix.Prompt import Prompt
from promptrix.SystemMessage import SystemMessage
from promptrix.ConversationHistory import ConversationHistory
from promptrix.UserMessage import UserMessage
import asyncio

# Read in .env file.
env_path = Path('..') / '.env'
#load_dotenv(dotenv_path=env_path)

# Create an OpenAI or AzureOpenAI client
client = OpenAIClient(apiKey=os.getenv("OPENAI_API_KEY"))

# Create a wave
wave = AlphaWave(
    client=client,
    prompt=Prompt([
        SystemMessage('You are an AI assistant that is friendly, kind, and helpful', 50),
        ConversationHistory('history', 1.0),
        UserMessage('{{$input}}', 450)
    ]),
    prompt_options={
        'completion_type': 'chat',
        'model': 'gpt-3.5-turbo',
        'temperature': 0.9,
        'max_input_tokens': 2000,
        'max_tokens': 1000,
    }
)

# Define main chat loop
async def chat(bot_message=None):
    # Show the bots message
    if bot_message:
        print(f"\033[32m{bot_message}\033[0m")

    # Prompt the user for input
    user_input = input('User: ')
    # Check if the user wants to exit the chat
    if user_input.lower() == 'exit':
        # Exit the process
        exit()
    else:
        # Route users message to wave
        result = await wave.completePrompt(user_input)
        if result['status'] == 'success':
            print(result)
            await chat(result['message']['content'])
        else:
            if result['response']:
                print(f"{result['status']}: {result['message']}")
            else:
                print(f"A result status of '{result['status']}' was returned.")
            # Exit the process
            exit()

# Start chat session
asyncio.run(chat("Hello, how can I help you?"))

I had a lot of issues with dependencies, pyreadline3 is the windows alternative for readline, installed pyee too or so.

But still getting different errors atm “KeyError: ‘Could not automatically map gpt-4 to a tokeniser. Please use tiktok.get_encoding to explicitly get the tokeniser you expect.’”

Ah. good to know. I run it with 3.5 for testing. Doesn’t really matter, it’s just using that to get a rough token count. I don’t think I can get a fix out tonight, but maybe, otherwise early tomorrow. with luck will have core agent code in as well by then…

pyee - ah. tnx, will add to dependencies in pypi pytoken.toml. Sorry

ps - will be offline for the night shortly

1 Like