Gpt-4o-mini stops following instructions after a few turns

Hello,

Do you refer to the conversation context, i.e. the user messages here?

Care to explain this in a little more detail? How would citing a file vs using system prompt make a difference? And what do you mean by force operating parameters?

I get what you’re saying here and it may very well be part of the issue. Just to be clear, the model does call file search again in later turns, and most of the time it also calls my function search_catalog (which is what actually gives the model the material to show to the user). What it does not do after a few turns is actually integrate those resources in its message (markdown images in the text or calling show_resource). How would you reset that?

I am not using GPT 5. Currently, I’m experimenting with 4.1 mini (I moved away from the initial 4o mini because it really couldn’t follow instructions if its life depended on it). Does this advice still apply? By “anchoring in chat” do you mean sending a developer/system role message with a reminder of the rule?

Also

What would be a concrete example of this?

Something I thought of could be this:

  • when I receive a user message, I classify its intent first (LLM-in-the-loop, for example sending it to 4.1 nano and returning a value such as “new_topic”, “confirmation”, etc.)
  • if the user message intent is classified as “new_topic” (or something equivalent), I also send the “main” chatbot a developer/system message telling it explicitly to call show_resource/integrate an image in the response
  • after the response has been generated, I check whether the model followed the instructions by verifying if show_resource was called / the text message contains ![...](...). If it doesn’t, I send a developer/system message to the model asking for a follow up and explicitly requiring it to integrate resources in the response.

Is such an approach close to what you were referring? My worry with this approach would be increased cost and latency, as well as possible cases of misclassifying user message intent.