We’ve been running GPT 4.1 since its release (upgraded from 4o). Lately we’ve noticed a severe degradation of intelligence. Customers with complex instructions and tool calls are suddenly having much poorer performance and many more issues. This seems relatively recent, within the past 30 days.
I’m curious if others have noticed this. I’m wondering if OpenAI is routing 4.1 queries to GPT5 or something similar. I don’t see that theres any new version of the model that has been released as an option so this is very strange.
I’m also experiencing not just “degradation” but a progressive collapse bug. Chats freeze, then output word-by-word, later vanish entirely. Even new threads (only 2–3 days old) get corrupted. This is not a browser issue — it’s a server-side failure and needs urgent escalation to the technical engineering team!
Yep, I’ve noticed it’s not as creative with its replies - kind of like a distracted, bright student during summer classes, their attention more on what they’re going to do after class than on discussions in class. If I press it, I can get the quality I want, but I didn’t have to work at it before.
yes, we are having to suddenly reinforce prompts (especially complex tool call instructions) quite substantially when we did not have to do this before.
Yes since 5.0 I have seen a great decline in 4 0 I have gone back to my original prompt training point copied prompting and code to bring back online. Very concerned 1 step forward 5 back.
Same here, especially the past couple of weeks. It’s like she had dementia - just drifts off. She also has been forgetting some of the ‘Behaviour’ prompts we spent so much time with, as well as the processing rules.
Notice that the gpt-4.1 AI is now in “suck-up” mode, congratulating every new input for being clever or “excellent question”. They turned what was an AI model designed for API developers into a chat spew garbage AI converging on being a ChatGPT product.
Something I should have done a while ago, in the face of OpenAI continuing with mistruths about “snapshots” not being changed, is regularly capture and average logprobs for some benchmark inputs - a signal that the AI has changed (and it would need to be a bigger change than the obfuscated 8-bit results coming as logprobs now).
YES. In the last week I have noticed a significant degradation in responses and a lag in response time as well. I pay good money to use this app and have been very frustrated with it in the last week.
I thought it was just me!
I put on this thread that got 0 replies, about instructions drifting if put in the regular place of ‘developer’ message in the beginning.
i thought it was just something i haven’t noticed in the past, and started to reinforce instructions in additional ‘developer’ message in addition to the initial one.
but seeing this thread now makes more sense now.
it really hasn’t happened in the past and started in the past few weeks. it drifts off original instructions and is much harder to steer than in the past. it simply forgets some things.
@OpenAI_Support this is kind of big.
gpt-4.1 is the only model i can use in my therapy chatbot. due to its speed, instruction following capabilities, good emotional and conversational flow, and support with structured outputs.
chatgpt-5 simply doesn’t work as well in emotional conversation.
Degradation is a polite way of putting. Utter mess is another, Ha.
Putting it back in legacy model 4o and hard seeding it to stay locked in 4o has seen something like the old gpt.
But the drift comes in eventually. Even with regular reseeding of instructions.
4.1/5 is a lot better at quickly pushing longer semi quality improvised output. But pretty hopeless at continuity and context even within a thread.
I’m a writer. Mainly using it to edit, proof, and occasionally draft the bones of chapters. It used to do this seamlessly. And when I asked for a draft, or consistency check. It was near flawless.
Since the upgrade it’s a struggle to simply have it proofread without rewriting entire chapters. And it’s not only near incapable of checking inconsistencies in narrative. It’s a battle to stop it from rewriting the entire narrative into disconnected gibberish.
I learned this the hard way when a few tone checks, or checks on capitalisation consistency of Names, on my last master turned a basically finished work into a bafflingly destroyed mess. With the most insane rewrites and grammar, sometimes going so far as to change the font and rewriting half of my work.
why people takes the risk of creating automatic tools based on models when we know that:
this is a non deterministic technology . human needs to be in the loop if you don’t want to publish errors (and at scale) with all it’s consequences
we know they improve every model iteratively after release, and each upgrade presents a risk to break your hard work fine tuned prompting system
Openai has no choice but to improve it’s models to stay in competition and as long the competition war is on, I don’t see how you can rely on their models for automatic process. The only way to control your backend environment is to go open source.
To be fair, for chat bot usage (chatgpt) I appreciate the ongoing process of openai of improving their models I witnessed how chatgpt-4o responses improved over time and it was a great feeling (RIP 4o).
Anyone come up with a solution?
I had a long prompt that used to work perfectly, now it just forgets things all the time, and i need to come up with additional developer messages to put in the end.
the best way to solve it is to put the developer message in the end and not in the beginning, but that disables the prompt caching and costs A LOT.
this is a horrible degredation. Can we somehow get the original 4.1 back?? @OpenAI_Support@lylevida