Hi there,
I created an online game that uses GPT 3.5 and 4 as the game master. I’ve equipped the game master with several functions- e.g. move to a location, talk to an NPC, etc as well as corresponding system instructions on when and how to call them.
This all works fine, but what my players noticed during playtesting is that after around 5000 tokens generated in the conversation, GPT 3.5 as well as 4 start becoming lazy. Meaning: They just won’t call the appropriate functions by themselves anymore, whereas in the beginning it works perfectly. So the player might say:
“I want to move to the tavern”.
And the Game master responds with a text response, describing how the player moves into the tavern, instead of calling move_location with the correct parameters.
However, if I ask the game master this:
What is your instruction for changing game locations?
He tells me that he will call the appropriate function to change the game location with the correct parameters.
So he can remember the function and its corresponding instruction - its also only 5k tokens in from the 120k total for e.g. GPT 4 Turbo - but he refuses or becomes too lazy to call any functions anymore.
Any idea why this is or how to solve this? I feel like the issue weakens the concept of a functional AI agent.
The solution I had in mind to get around that is to preprocess player input, and categorize it. Then use the categorization to force a function call. But a dynamic system would be cooler. It works perfectly for the first 5k tokens, but then suddenly stops. Why is that? Is this an OpenAI specific limitation or an LLM specific limitation? Do other models work better in that regard?