How to make an agent actually learn?

Maybe using an old, traditional LLM to train the agent is better than real people training it. The trainer only needs to put question and judge if the answer is correct, which can be done by a LLM with appropriate database.

But I still can’t tell the difference between your idea and the model which “just being able to retrieve information semantically related”. Did I misunderstand your idea? Please point out if I did.