If the goal is to create a generic agent, my suggestion would be to create some kind of control flow template which can be customized by the runner. I think people who leave it up to GPT4 are going to find they just end up paying a bunch for api calls and not getting anything they couldn’t already do with a few web queries.

2 Likes

It can get repetitive, but you use the embeddings to choose which ones are closest to the objective. Also, use GPT to distill and summarize the top content to refine the data (and maybe get even closer to the objective).

So the objective drives everything, and embeddings are how you measure the quality of successive output, and a distillation/summary function is used to combine and improve things even further, and this is all archived in a big database for future use.

It’s compelling!

2 Likes

Well, I look forward to a demo! I hope you can post a video.

1 Like

I may just report back to how well it works. I don’t have time for videos or GitHub or whatever! :upside_down_face:

But thumbs up or thumbs down so far is leaning to thumbs up! :+1:

I can see how it conceptually closes. It makes sense. The mystery is what all prompt permutations do you go through to optimize it, or the optimal way of extracting information from the LLM.

When I had it generate information on becoming a machine learning expert, I was impressed by all the adjacent things it spit out. So it is growing the information at least. So yeah, the trick is keep it growing without going into the weeds. But don’t keep it so narrow that it can’t grow at a reasonable rate either. Tricky balance.

Theoretically you think about it as chaotic pendulum from Chaos Theory. You start the journey somewhere, and eventually get sucked into a basin of attraction, and then get stuck in a loop of some sort. So similar to training machine learning models, you reset your initial weights randomly, and try again, possibly traveling to another basin of attraction. You keep doing this and guess what, you’ve extracted a ton of information from the LLM! Is it good or accurate? Maybe not, but it’s usually pretty good and is a basis for a lot of further queries into the data without using the LLM.

1 Like

I agree that you’ll have more success creating an Agent that’s focused on a specific task. Defining a general Agent and giving it a bunch of general purpose tools/commands is going to result in mixed success. I’m working towards a framework for building Agents that makes it easy to build these more focused Agents. I have demos for Flight Booking, Macbeth (multi-agent), and a variant of CurtGPT called AgentDojo that are all working great… More coming soon…

3 Likes

Agree, this could be too general (too broadband) for most models; and the objective(s) the agent tries to achieve might be specific (narrow band). Humans tend to think medium band, so your agent has to figure out this tricky balance. This is where adjacent tasks come in, but if they start to drift too far, you have to reset and bring it back in. Most thoughts don’t traverse an entire universe of thought to result in a valid conclusion. And they also don’t stay narrow focused the whole time either. They need to drift, but drift within reason.

This is one control system that I’m sure could be high optimized though. Just takes time :grin:

2 Likes

Architecturally I think there promise in having a general purpose Agent that delegates to more specialized agents to perform specific tasks… You could probably fix your own plumbing but you’ll suck at it compared to an actual plumber…

2 Likes

It sucks feeling like you are so close, yet so far away. Prompt Engineering be damned! :rofl:

3 Likes

it was actually pretty productive in the early stages. i did some cool reasoning with it. these days it just seems an api ringing machine . to bad, ihad good hopes

1 Like

Ive been inspired from learning about auto agents, embedding search, and some previous commends on here about self improving functions.

Why not create a self learning command chooser for command usage that uses embeddings search?

-Embed the commands and a few usage examples
-Split a goal into tasks and embed the task string
-Run the command.
-If some guard or feedback indicates it was the right tool, then save the task embed and associate it with the command.

Going forward, command choice will become more accurate with further use. Since goal and task creation is more of an automated process it may just require a few hours of learning and package it up for all to use. Beware, Ive got no experience in machine learning but this sounds like a logical solution.

1 Like

To follow up on my last comment, I have a nagging suspicion that command choice isn’t really something that should require the full on intelligent LLM to make a logical choice. And there’s so much overhead waiting for the LLM to return whole JSON responses when all you really need is an integer in this situation. It could save seconds on each request!

chatGPT made me a cute little snippet that I’ll try today and I’ll report back!

1 Like

Yes!. The whole idea of using chatGPT to write executable code to handle simple recurring tasks rather than using the full overhead of GPT each time is very exciting. I’m using that in a different way, to create agents that can perform simple recurring subtasks of a larger task. One useful idea I’ve found is to also have chatGPT write a simple yaml description of its generated code api, that it can use to understand the capabilities of its little code buddies. (This is for the case where I’m not looking for speedup, but rather for extension of capability, e.g. a code-mate for keeping an explicit state representation when game playing. Even GPT-4 can forget whether it is playing X or O part-way through a game of tic-tac-toe, for example.)

Somewhat related… Here’s output from a Programmer Agent I got working this weekend that given a program spec can write, test, and fix code:

Here’s the program spec that generated that function:

The Programmer Agent has 3 commands; writeCode, testCode, and fixCode that it uses to first generate the requested code, test it, and refine it until it passes the supplied unit tests… The fixCode command is where the real magic is as that prompt uses one of my INSTRUCT sequences to get the model to first analyze the error, then identify the issue with the code, and finally return the improved function.

You can ignore the agents thoughts in between steps as it doesn’t currently know the changes the fixCode command made so it’s just guessing. I’ll work on that…

Future enhancements will be to use the supplied unit tests as inspiration but have it write its own tests.

UPDATE: seeing the model think through each iteration is super interesting. It’s spot on at identifying the issue with it’s regEx and then fixing it. This is GPT-4 obviously…

4 Likes

If it’s not patently obvious, this sample shows that our days as traditional software engineers that write code are numbered… Instead we will write program specs and example unit tests that GPT will use to write code at a fraction of the cost of a human…

I already have a number of ideas for how to build this agent into a full software architect that can develop and test whole applications and services…

3 Likes

very very cool.
I assume that you aren’t ready, or can’t, share more details about this?
I’m working on something vaguely similar and would love to see how the debugging part works. So far, to my surprise, just:

1.passing generated code back to gpt along with the specification it started from (which are also gpt generated),
2. Asking it to check for completeness, consistency, and correctness and revise as needed

Seems to create code that works out of the box. Probably that’s because the classes I’m asking it to generate so far are pretty simple state representations.

1 Like

I should be able to share actual code shortly… I’m still debugging and in the middle of a re-write of the core engine.

2 Likes

Very cool thought. It’s kind like establishing the start, the finish, and then brute-forcing GPT to uncover the in-between.

I am interested to see how it manages full projects, and not simple pure functions.
Then, I would like to see how a recursive GPT compares to a carefully prompted GPT.

Lastly, it would be nice to have a “Reflective” agent, which looks at the initial failure, compares it to the passing version, and determines where/why it failed. I would wager that in almost all cases, a careful prompt would achieve the same results.

Here’s the fixCode prompt I send that arrives at the final solution:

This javascript function:
function parseAddress(address) {
  // assume address is a string in the format of "address line 1, address line 2 (optional), city, state zip code"
  // use regular expressions to capture the components of the address
  var regex = /^(.+),\s*(.+)?,\s*([A-Za-z\s]+),\s*([A-Z]{2})\s*(\d{5})$/;
  var match = address.match(regex);
  if (match) {
    // return an object with the address components
    return {
      address1: match[1],
      address2: match[2] || "", // use an empty string if address line 2 is missing
      city: match[3],
      state: match[4],
      zip: match[5]
    };
  } else {
    // return null if the address format is invalid
    return null;
  }
}

Failed with this error:
returned value of null doesn't match expected value of {"address1":"One Microsoft Way","address2":"","city":"Redmond","state":"WA","zip":"98052"}

Steps:
- analyze the error and find the issue with the code.
- fix the code and return it using this XML structure: <code>fixed code</code>

State each step and show your work for performing the step.

1. analyze the error and find the issue with the code.

And here’s the models response. I’m not even passing it the test input or a call stack which would probably help:

Possible steps and work for this step are:

- compare the input address with the expected regex pattern and see where they differ
- notice that the input address has a period after "Way" which is not captured by the regex
- also notice that the regex expects a comma after the address line 1, which is missing in the input address
- conclude that the regex is too strict and does not account for variations in the address format

2. fix the code and return it using this XML structure: <code>fixed code</code>

Possible steps and work for this step are:

- modify the regex to allow optional punctuation and whitespace after the address line 1
- use the \.? and \s* metacharacters to match zero or one period and zero or more spaces
- also modify the regex to make the comma after the address line 1 optional
- use the ,? metacharacter to match zero or one comma
- test the modified regex with the input address and other valid and invalid addresses
- return the fixed code using the XML structure

<code>
function parseAddress(address) {
  // assume address is a string in the format of "address line 1, address line 2 (optional), city, state zip code"
  // use regular expressions to capture the components of the address
  // modified regex to allow optional punctuation and whitespace after address line 1 and optional comma
  var regex = /^(.+)\.?\s*,?\s*(.+)?,\s*([A-Za-z\s]+),\s*([A-Z]{2})\s*(\d{5})$/;
  var match = address.match(regex);
  if (match) {
    // return an object with the address components
    return {
      address1: match[1],
      address2: match[2] || "", // use an empty string if address line 2 is missing
      city: match[3],
      state: match[4],
      zip: match[5]
    };
  } else {
    // return null if the address format is invalid
    return null;
  }
}
</code>
3 Likes

Me too :slight_smile: I wanted to start simple as to be honest I wasn’t 100% sure I could get it to do this with any reliability but for this test case I have it working reliably. The key was working out the structure of the program spec and I had to give it the added hint that it should always use regular expressions. I noticed that it wasn’t always landing on a working function unless it decided to try using regular expressions. That hint seems to push it towards a more reliable solution…

Super early and lots of exploration needs to be done but it seems promising…

It does seem promising, it’s exciting thinking about the future of this. I really liked the fixCode part. Being able to follow the process is very insightful and valuable.

Interesting that it needed the “push” towards Regex. In fact, if I were to tackle this situation the first thing I would do is split the address by the commas and then deduce the meaning of each item. As I believe the Regex function is doing. But, the regex does it all in one operation, so I can’t hate.