Lazy GPT - The biggest issue with GPT models

In my experience of extensively using all the latest models from openAI since the first release of GPT 3, I can see that the open ai models although getting smarter significantly, but also they have been getting lazier with every new release. This lazyness trend has increased in my experience after release of gpt-4o, which I think is the best model so far in terms of being less lazy.

The lazy GPT issue is the main draw back for using GPT models for any serious task, especially when it comes to coding. The lazy GPT is not limited to older models, and it is actually more pronounced in the newer models and even the reasoning models.

By lazy, I mean not providing a complete answer to a request even when instructed. The model tends to avoid doing a hard work. As an example, if I give a model a body of text and well below its context window limit, and then give an additional context to be integrated into the original document and instruct it to maintain all the details of the original document and integrate the new information. The returned result will be a summary of the original document with added summary of the new info. A lot of information will be lost, then when I point out, the model apologizes and makes it somewhat better, but still it doesn’t return everything that it is asked.

In another example, the GPT models, even the reasoning ones, when used in coding with Cursor, almost always implement a simplified solution, or more of a mockup, even if instructed to do a complete the implementation, which make these models absolutely useless in real world coding problems.

Why it is happening?
My hypotheses for why it is happening and why it got worst with newer models? I believe, the culprit is using GPT itself in the reinforcement learning loop to somewhat evaluate AI’s responses, or when they use GPT to generate synthetic data. Assuming models used to generate synthetic data or models that evaluate another models responses in a RL loop have some inherent lazyness that has not properly resolved yet, this lazyness could be magnified in an RL loop, and despite the base model going through RL loop being significantly smarter, it may also become significantly lazier during the RL training loops.

Final notes, lazy engineering can result in lazy models. I really hope openAI’s team will pay attention to this issue since it is the biggest draw back for why GPT models are not as useful anymore. They are still useful for day to day tasks, but not for serious tasks.

I’d love to hear from expert about tis issue, please leave a comment and share your thoughts.

Community

3 Likes

Why almost all GPT models are lazy and with every new model they get even lazier for serious tasks?

This laziness is specially a show stopper for use of OpenAI models for coding. Even the new GPT-4.1 used in windsurf is lazy, doesn’t follow instructions, provides half done work in the output)

No matter how smart, when a model is lazy it can not be trusted for real world problems and use cases. Please fix.

Thank you!

1 Like

The hallucination rate and lazy outputs of o3, o4-mini-high, and o4-mini are quite alarming. As for GPT-4o I really thought that it would be improved with the sycophancy reversal, incorporation of GPT-4.1 features, and status as full replacement to GPT-4; but it also falls into the recent slump in the quality of outputs in past weeks. For me this has been happening when asking basic questions for historical research. Overall these models have been less reliable than those that they replaced: GPT-4, o1, and o3-mini-high.

I think they are using 4-turbo as the foundation.
4o was not lazy, but after recent updates, it became lazy