GPT 3.5 and 4 being lazy / refusing function calls after 5k tokens

Hi there,

I created an online game that uses GPT 3.5 and 4 as the game master. I’ve equipped the game master with several functions- e.g. move to a location, talk to an NPC, etc as well as corresponding system instructions on when and how to call them.

This all works fine, but what my players noticed during playtesting is that after around 5000 tokens generated in the conversation, GPT 3.5 as well as 4 start becoming lazy. Meaning: They just won’t call the appropriate functions by themselves anymore, whereas in the beginning it works perfectly. So the player might say:

“I want to move to the tavern”.

And the Game master responds with a text response, describing how the player moves into the tavern, instead of calling move_location with the correct parameters.

However, if I ask the game master this:

What is your instruction for changing game locations?

He tells me that he will call the appropriate function to change the game location with the correct parameters.

So he can remember the function and its corresponding instruction - its also only 5k tokens in from the 120k total for e.g. GPT 4 Turbo - but he refuses or becomes too lazy to call any functions anymore.

Any idea why this is or how to solve this? I feel like the issue weakens the concept of a functional AI agent.

The solution I had in mind to get around that is to preprocess player input, and categorize it. Then use the categorization to force a function call. But a dynamic system would be cooler. It works perfectly for the first 5k tokens, but then suddenly stops. Why is that? Is this an OpenAI specific limitation or an LLM specific limitation? Do other models work better in that regard?

I see every problem as a nail I would want to swing my polychromatic hammer at it. Your high frequency signal needs to be underpinned by a strong low frequency signal to maintain coherence over a longer period.

One issue is that the model simply ‘forgets’ to call the function - that the instruction to call the function simply doesn’t seem relevant anymore.

You want it to follow a checklist of things to do, but it gets got so carried away by writing prose that it spat out the end of text token because it was proud of its creation, without thinking much about what to do next.

One way, as you noted, is to simply make a second call to decide what to do.

Another way to make it “take another look at the todos” is to use a schema. Every time the model closes a tag, or a bracket, it will look at the schema region again and decide what to do next.

Of course you’d have to parse that, and either option would require more programming, unfortunately.