Behavior of the llm based on an old prompt

sfarnour72 · June 11, 2025, 6:50pm

I had a prompt that was working very well for more than a month, but today the responses are no longer stable, mainly, I’m referring to the function tools I defined; the LLM is not using them correctly.

_j · June 11, 2025, 7:01pm

OpenAI writes, and continues to call model names like gpt-4.1-mini-2025-04-14 “snapshots”, denying that they alter the performance…

So here is a specific example - of examples over and over of applications breaking by OpenAI changing the models in production.

If the performance is seemingly a random lottery of not employing functions correctly, you can reduce the top-p parameter to below 0.5, which should give you function calling that is not a lottery, but based on the best prediction.

If your AI generation is already inside a function, and the wrong function is utilized or values are not being filled properly, about all you can do is then overwhelm the inattentive AI with unmistakable description fields for each property, and use enums in a schema for those values with a particular set of allowed options.

OnceAndTwice · June 11, 2025, 8:55pm

Are you using o3? They made changes to it and there is suspicion that it performs worse.

As general guidance, I would make sure the model is given clear information on what to do and when to do it, and how tools should be used. The models are known to avoid calling functions if there are any issues or contradictions in the instructions.

The quote is indeed confusing. If model behavior isn’t changed, then why make any snapshots at all?

I think the suggestion that the underlying model is the same could be taken to mean that they are using LoRA fine-tuning to make their snapshots. It would also explain why new snapshots sometimes feel more overfitted and dumber than the one before it. That’s a side effect of SFT.

I feel like reducing top_p and temperature tends to induce lazier responses. I think I could justify doing this in something simpler like a classifier though.

sfarnour72 · June 12, 2025, 7:51am

No, im using the gpt-4.1-mini

katiagg · June 13, 2025, 10:16am

The model behavior for a given snapshot doesn’t change, but it can change between snapshots. So that is why we are saying that if you want to make sure behavior stays the same, you should use specific snapshot names instead of aliases, where the underlying snapshot can change.

We don’t have a new snapshot for 4.1-mini so if the behavior changed it’s an unexpected issue. Could you please share example prompts you were using and example outputs illustrating the change in behavior?

Thank you!

merefield · June 13, 2025, 10:34am

FWIW I find 4.1 is particularly intolerant to quantisation wrt to function discrimination so consider using the full 4.1 model instead or go back to 4o-mini.

Topic		Replies	Views
GPT 4.1 Mixes up languages Bugs api , gpt-41	4	238	June 13, 2025
GPT 4o mini took a hit ever since o1 was released API gpt-4	10	939	September 18, 2024
Has regular gpt-4 model changed for the worse by any chance? Community gpt-4 , hallucinations	12	1818	April 23, 2025
Changes to 4o-mini in last 24hrs that would cause performance degradation? API assistants-api , gpt-4o-mini	4	400	January 1, 2025
Open AI APIs responses becoming random Community gpt-4 , api	3	842	April 28, 2024

Behavior of the llm based on an old prompt

Related topics