O3 and o4-mini are lazy. Not suitable for coding at all

pentchal · April 22, 2025, 4:48am

Thanks to estudante_picasso, many of us have been spared from wasting money unnecessarily. We’re grateful to him.
OpenAI is surely aware of this issue—but why do they remain silent?
Over the past few days, I’ve watched numerous YouTube videos praising the latest update, featuring attractive charts that highlight impressive progress across new models, including the o4-mini-high model. Not one of these videos addresses this particular issue. Only occasionally, buried in the comments, can you find rare voices of dissent.
It seems that, apart from coding, ChatGPT’s other capabilities haven’t suffered much.
At least ChatGPT itself managed to find several active discussions about this issue across different platforms, including this forum. Moreover, it provided a fairly professional summary of these discussions, along with accurate conclusions and recommendations.

ppoc6000000 · April 22, 2025, 12:51pm

Honestly, ever since image generation dropped, it feels like OpenAI’s servers have been gasping for air under GPU overload.
They probably didn’t want to delay releasing new models, so they just went all in on aggressive lightweighting to survive.
Maybe things will get better if they secure more GPUs, but right now?
o3 and o4-mini are as lazy as GPT-4.

alzackting · April 22, 2025, 1:16pm

thanks for closure bro. (feel like im going crazy scowering the internet for openai to explain their downgrade, or at least for people to agree openly)

we’re all wondering the same thing… i.e. will they come out and confess what they’ve done? Will they bring back o3-mini-high with limited queries? Or continue gaslighting ?

They are yet to suggest any fixes or admit what they’ve done (guess they don’t understand the principles of a relationship are communication lol)

In the mean time, i’m off to deepseek for the most part.

alzackting · April 22, 2025, 1:26pm

its not just coding, since its overall computation power, and actual drive to not be lazy is less, when I give it stuff like long messages telling it to produce translation tasks for me in a certain format, and of a certain difficulty, it’ll just ignore cuz its too lazy…

Bear in mind I copied and pasted the exact same prompt, which a mere few days ago i was using on o3-mini-high, and it was obeying all my instructions (and cooking), o4-mini-high of course ignores the format and difficulty I asked for - useless. Now with o4-mini-high i genuinely can’t challenge myself for learning this language like I could before.

edit: atp it’s basically chat gpt 4o-mini that i be using on my phone - defeats the purpose of having a subscription too. I’ll give them another maybe 4-7 days grace period, then my subscription is getting cancelled

SiteEnFlash_Developp · April 22, 2025, 3:05pm

Adding my voice to the growing chorus of frustrated users here. I’ve been an avid user of ChatGPT, pretty much since the very beginning – going on almost 2.5 years now. I’ve relied heavily on the GPT-4 models for complex tasks, especially coding, and was always impressed by its capabilities.

However, the recent performance decline is honestly shocking and deeply disappointing. Like many others in threads like the one discussing “O3 and o4-mini are lazy,” I’m finding the current models almost unusable for serious work. It genuinely feels like I’ve been downgraded – not just slightly, but drastically. Frankly, the output often reminds me of GPT-3.5, and sometimes maybe even worse.

The laziness described is exactly what I’m experiencing:

Refusing to generate complete code blocks.
Stopping mid-thought or saying things like “Paste the rest here.”
Struggling with complexity it used to handle with ease.
Requiring endless prompts to get even close to a usable result, often full of errors.

This isn’t just an inconvenience; it’s actively hindering productivity. I’ve spent hours trying to get code generated that previous models handled perfectly, only to end up with incomplete garbage, just like user BN-21 described wasting 15 hours implementing wrong code.

The situation became so untenable for my coding needs that I’ve recently switched over to using Google’s Gemini, and the difference is night and day. Just yesterday, Gemini successfully generated a complex script close to 1000 lines long without breaking a sweat – a task that seems utterly impossible for the current GPT models I have access to (likely the o3/o4 variants discussed).

It’s incredibly frustrating to feel like we’re paying for a premium service that’s delivering results far below its established potential and worse than free alternatives in some key areas now.

Like user estudante_picasso questioned, is this intentional nerfing? Cost-cutting? Whatever the reason, the current state is unacceptable. Reading these forums shows this isn’t an isolated experience.

I really hope OpenAI is listening to this widespread feedback and plans to address this regression soon. Otherwise, loyal users like myself will increasingly have no choice but to rely on competitors that can actually deliver on complex tasks reliably.

Is anyone else finding alternatives like Gemini are now significantly outperforming GPT for coding?

Thanks for reading.

estudante_picasso · April 22, 2025, 7:15pm

Thanks a lot for your detailed message explaining the current situation, and even how other AI systems have evolved (I was honestly impressed you mentioned Google Gemini).

Personally, I’ve always used Claude as a second option, but it still has major limitations in terms of available information—though it does deliver solid results overall.

To add a bit more context: I ran tests using large inputs. Models like o4-Mini, o4-Mini-High, and 03 Preview weren’t able to deliver accurate responses when it came to understanding the full context. Meanwhile, the standard ChatGPT-4o handled everything correctly and gave precise, on-point answers.

This makes it pretty clear that the issue is with the new lightweight models—and yeah, it seems to be directly related to how much information they can process (possibly some kind of nerf).

Try this yourself: feed it a big chunk of data and ask for a specific response that requires layered reasoning. What happens? Around 10 seconds of “thinking” and then a horrible answer.

But if you feed the same model a very small or even meaningless input—or ask a hard question—it’ll “reflect” for a whole minute or more.

It’s obvious that once the input size hits a certain level (which the standard ChatGPT-4o can still handle fine), the others cap out quickly and just give weak results.

From these tests, it’s clear something changed. And it’s directly tied to how much info the models can handle.

What really doesn’t make sense is why this is affecting PRO users. I mean, we’re talking $100/month here.

That tells me this might’ve been a targeted update—or a nerf—for specific scenarios. But if so, it was poorly executed.

What I’m sure of is this: it won’t stay like this. No way. The company will notice and fix it soon.

estudante_picasso · April 22, 2025, 10:18pm

Dude, I ran some tests—seriously, check this out:

First, I tested with ChatGPT—a 532-line code.

Twice with the 04 Mini high:

Results:

529 LINES – part of the code and logic was removed
Then:
550 – NEW LOGIC, BUT INCORRECT

Tested twice with O3:

477 LINES – RESTRICTED
256 LINES – COMPLETELY WRONG

With DeepSeek ----

I sent it once, and with one response from DeepThink (R1) – 90 seconds of thinking

I got the code with 552 LINES WORKING on the first try!!!

Unbelievable! UNACCEPTABLE!

CHATGPT LINK:

I can’t share the DeepSeek chat with you, but I’ll upload the new file as a .txt if you want to check it out.

DeepSeek, generated code:

Note: I’ve been a ChatGPT customer for years. I support, approve of, and will always stand by you—but this is honestly a disgrace. I had to share it with you. It’s 100% true.

pedrodschott · April 25, 2025, 12:25am

Completely agree, they are a HUGE, not minor, HUGE stepdown from previous models

They don’t obey, no matter how you tune the prompt. They don’t spit out all files and even when they do put “rest of file continues here”, something other models since o1 never did

mseddi_omar · April 28, 2025, 4:41pm

Guys any update did they correct it?

Ejjejrjr_Wywhheehe · April 28, 2025, 5:23pm

No, from personal experience - nothing at all

sdfgsdfg · April 30, 2025, 2:05am

I remember how good GPT4 was in the first days and months.

in my experience the first versions of GPT4 was actually better than what we’re being spoonfed. They also made a habit of dumbing down older models, maybe transfering GPUs to new models and nerfing flagship models like o1-pro as they roll out higher tool using models that clutter the context window, arguing that they are “better” than o1 or gpt4 . They are not.

It seems GPT4 was the best openai could ever achieve and from here on it’s downhill for them.

troubledsnail23 · May 2, 2025, 9:04pm

I think gpt4 was them putting in an honest effort in order to attract people. Now, they feel they don’t need to and people will praise them regardless. o1 wasn’t great, but it was heads above it’s replacements. o4-mini has some utility for image making since it does seem to follow instructions a bit better. But that’s it. I’ve actually had it literally complain about my prompts.

ppoc6000000 · June 2, 2025, 8:28am

ChatGPT has become much less lazy.
Thank you all for participating in this conversation.

Johannes_Castner · July 31, 2025, 6:08pm

It used to be better not long ago; what happened?!

Johannes_Castner · July 31, 2025, 6:09pm

thats exactly my problem too!

Topic		Replies	Views
Why I Think GPT Is Now Lazy Community gpt-4 , chatgpt	30	19326	February 6, 2024
O1 not as good as o1-preview for problem solving Community chatgpt	32	3388	January 13, 2025
GPT4-Turbo more "stupid/lazy" - It's not a GPT4 API gpt-4 , chatgpt , gpt-4-turbo	33	11548	March 18, 2024
GPT-5 Coding Feels Downgraded — Please Fix This Coding with ChatGPT	93	10458	September 29, 2025
Who is still using Codex? API codex	42	12230	March 24, 2023

O3 and o4-mini are lazy. Not suitable for coding at all

Related topics