How to make an agent actually learn?

I’m fairly new to AI development, but I’ve already built some automations with agents and tools connected to it.

I’m trying to understand if it’s possible to make an agent actually learn about a user.

Not just being able to retrieve information semantically related, but being able to recall previous information, even if they don’t relate semantically.

I’ve recently stumbled upon the term “ontology”, but don’t know how to use that.

If that’s the way to go, how can I store and retrieve information in this ontology format instead of just a vector database?

1 Like

Try to make the agent lie to the user.
Everytime the user points out that it lied you save that.

Posting something wrong intentionally is the safest way to get the truth.

Cunningham's Law - Meta.

And the next step is trying to make the user mad by intentionally post something wrong repeatetly to get different variations of explanations…

In a graph you can store the result of such different agents - the liar, the troll, etc. in the information graph.

Only when you get at least 50 different variations of explanation you will add that to a ground truth graph.

To find out if you got an explanation you need to classify the answer with intent and an entity.

The entity is the most accurate structured explanation of a thing.
Then you need to find abstract words to classify that…

What really helped me to understand was using multidimensional arrays in PHP to store data.
You will have to restructure the data to replace a string with an array or object.

To get it even more accurate you should use evaluation agents. Each has only one job. Check for one criteria to get a score.

When you split it into different actions you can mimic human learning even better.
Assign a tiredness value and define goals to get missing information.
Hunger and Sleep are important. Hunger to get into lie or troll mode until it gets fed and tiredness to start a dream algorithm which cleans the graph from redundancy.
Well and maybe check for signs of boredom or madness level of the user.
You don’t want to get too much of either.
Socializing with the user to make him feed the agent is important too.

1 Like

Maybe using an old, traditional LLM to train the agent is better than real people training it. The trainer only needs to put question and judge if the answer is correct, which can be done by a LLM with appropriate database.

But I still can’t tell the difference between your idea and the model which “just being able to retrieve information semantically related”. Did I misunderstand your idea? Please point out if I did.

The difference is that the model has a different understanding of the world than the user.
This is all only to get closer to the user not to humanity.

The task was “I’m trying to understand if it’s possible to make an agent actually learn about a user.”

Combining those personalized agents might be the most interesting part.
I bet openai is on that.
I mean aren’t we all just trainers of the ai?

Not only openai, all social media company are doing this. Collecting user data and analyzing them and giving them ad.

The problem is, all of them are “just being able to retrieve information semantically related”, instead of actually learning user.

However, from some perspective, isn’t learning also a form of ability to retrieve information in some ways?

I see… That could be a solution.
Instead of using a hard method like oncology, I could simply store more information derived from the user input?

For example an AI agent that behaves as a therapist.
If the user mentions a work issue, I could try to detect the feelings (e.g. anxiety, stress).

And then I could store the feelings along with the input for a better vector embedding:

“The user is feeling anxiety and stress when reporting the following: {{user_input here}}”

Would that be enough for a very good agent?

I would add something like this:

Classification Tree for User Interaction Metrics
User Emotion and Sentiment

Frustration Level
Mild Frustration (e.g., slight annoyance)
Moderate Frustration (e.g., visible irritation)
High Frustration (e.g., anger, threats to leave)
Engagement Level
Highly Engaged (frequent responses, long messages)
Moderately Engaged (occasional responses, average message length)
Low Engagement (infrequent responses, short messages)
Positive Sentiment
Agreement (expressions of agreement or support)
Satisfaction (indications of satisfaction or positive feedback)
Negative Sentiment
Disagreement (expressions of disagreement or counter-arguments)
Dissatisfaction (indications of dissatisfaction or negative feedback)
User Response Type

Corrective Feedback
Simple Correction (straightforward correction of misinformation)
Elaborate Correction (detailed explanation with multiple points)
Emotional Response
Calm Response (measured tone, controlled language)
Emotional Outburst (use of exclamations, caps, or strong language)
Neutral Response
Acknowledgment (simple recognition without strong sentiment)
Indifference (neutral tone, lack of engagement)
User Intent

Clarification
Seeking Clarification (asking questions to understand better)
Providing Clarification (explaining to the AI to correct or elaborate)
Challenge
Challenging the AI (disputing the AI’s statements or claims)
Testing the AI (probing or testing the AI’s knowledge or behavior)
Agreement
Confirming (agreeing or confirming the AI’s statement)
Reinforcing (adding additional information to support AI’s statement)
Agent Performance Metrics

Lie Detection Success Rate
Successful Lies (instances where the lie was not detected by the user)
Failed Lies (instances where the user detected and called out the lie)
Provocation Effectiveness
Successful Provocations (instances where trolling elicited a detailed response)
Unsuccessful Provocations (instances where trolling did not elicit desired engagement)
Information Accuracy
Accurate Information Provided (validated correct information)
Misinformation Corrected (incorrect information identified and corrected by the user)
User Learning and Information Contribution

Knowledge Contribution
Unique Information Provided (new information not previously in the knowledge base)
Repeated Information (information already present in the knowledge base)
Explanation Variety
Number of Variations (different explanations provided for the same topic)
Depth of Explanation (surface-level vs. in-depth explanations)
User State and Behavior

Boredom Level
High Boredom (minimal interaction, repetitive or disengaged responses)
Low Boredom (engaged, varied responses)
Madness or Annoyance Level
Low Madness (calm, controlled responses)
High Madness (frequent signs of frustration, irritation, or anger)
Social Interaction Metrics

Socializing Attempts by AI
Successful Socialization (instances where the AI effectively engages the user socially)
Failed Socialization (instances where the AI fails to engage the user socially)
User Response to Socialization
Positive Response (user reciprocates social interaction)
Negative Response (user rejects or ignores social interaction)
Agent Self-Management

Hunger Level (desire for more data or corrections)
High Hunger (frequent lies or trolling to provoke responses)
Low Hunger (reduced need for data or corrections)
Tiredness Level (need for data consolidation or rest)
High Tiredness (frequent consolidation and graph cleaning)
Low Tiredness (minimal consolidation needed)
Summary
This tree provides a framework for tracking various aspects of user interaction and AI performance, focusing on emotional states, response types, etc…

I mean you should use multiple simultanious agents.

1 Like

wow, that’s interesting. What was the prompt you used to get that tree of metrics?

I gave it my response, asked for thoughts about it.

and then prompted

"I would say we got to have a bigger variety of classification criteria… especially in terms of measuring frustration and success.

give me a tree of all the values we need to keep track of"

way more interesting is the followup

“which mechanisms in the brain does this mimic?”

based on that you can name services e.g. the amygdala service…

and then the most interesting part begins.

extract workflows from that which is the actual action learned by the AGI…

I mean you don’t just want to feed it with ascii text I suppose

1 Like

Well… @stevenic has just released a library for tiny agents… that would enhance the results even more.

1 Like

That’s very interesting. Thank you!

What do you mean by:

extract workflows from that which is the actual action learned by the AGI…

I mean you don’t just want to feed it with ascii text I suppose

You wrote you used agents for automation.
Automation uses workflows.

So you feed the system with APIs and documents and it should try to figure out what to do with the document.
Since the document classifier finds out that there is no workflow it could trigger a human interaction e.g. to make them use a custom gpt which on request answers with the document’s content or a link to the document.
The user needs to explain then what to do with the document and from that a workflow is created.

1 Like

There is no way to currently make the model “learn”. Every session is a blank slate to everything that happened past the knowledge cutoff. Even fine-tuning is bad at adding knowledge for multiple reasons.

You are limited to “in-context learning”, which is basically whatever you add to the user/assistant/system prompt in any given message thread. I guess you can also do functional calling to retrieve external information like via API or from a database. Here you begin to get into retrieval augmented generation (RAG), which is just semantically retrieving similar information. To what is being asked or discussed as a way to artificially add knowledge or “learned” material. However RAG is something which is easy to get okay results with but becomes very hard to get great results with.

Could you list the multiple reason why finetuning is bad?

  • you shouldn’t use fine-tuning until you’ve exhausted all prompt engineering options, which are vast
  • dataset collection and maintenance is a pain
  • creates additional vendor and model lock in
  • increases inference cost
  • no guarantees that results will be significantly better
  • intentionally biases the model, makes it less capable in subtle and unexpected ways
  • makes it harder to upgrade to newer models because fine-tuning capability lags by several months
1 Like

Is that something that you tried or something that you read about? Could you provide a link to where you read about it?

My intuition is that fine tuning isn’t the solution to learning in agents… What you’re trying to do is get the agent to form a 'theory of mind" for the user its interacting with. It doesn’t need just one theory of mind, it potentially needs hundreds or even thousands of different theory of minds. One for each user it interacts with… That’s one part retrieval problem and another part applying that theory of mind to the agents interactions with the user.

HippoRAG could play a role in the retrieval part of the problem but I suspect we haven’t actually figured out the right solutions to any of these problems…

1 Like