O3 and o4-mini are extremly bad lazy and not suitable for coding anymore

Right now im experiencing very terrible experience with the newest models o3 and o4 they dont follow instructions they are lazy instead of writing the full code they just tell u (put this here or this part is unchanged or paste it here ) o3-mini-high was extremly efficient it could handle even 5000 lines of codes without any issue now these models struggle with just 400 500 lines of codes i paid 200 dollars because im a programmer and i need the full access to the models now i completly wasted my 200 dollars as the answers feels coming from gpt-4 turbo or gpt 3.5 completly useless very very very disappointed

13 Likes

You’re totally right, man — no argument there. Things are a mess. The model might be more intelligent according to the benchmarks, but the amount of context you can load into it got cut so much that, even if it gets the idea, it just can’t produce a remotely accurate output. It’s like running a function with most of the required inputs missing — the logic might be there, but the result is consistently off.

6 Likes

Completely agreed, if there is any solution please do share. Thanks

4 Likes

it only works when you are extremly aggressive in the promps and also customize instructions to tell him to always follow instructions never use omissions and always give me full codes and relevant answers to the promps, prioritize accuracy over speed and tokens. but the secret is to be extremly aggressive with ure promps but still doesnt work all the time right now chatgpt is completly useless for coding they did this to reduce cost and they will end up losing all their programmers community wich are the most loyal users and frequently payers i myself pay the 200 dollars subscribtion if this is not fixed i wont be paying it anymore and cancel my subscribtion

5 Likes

I agree with you 1000%.

These days, anyone could become a programmer—or at least they could. All it took was getting ChatGPT Plus, and if you really wanted to push the limits, you’d go for the Pro plan.

That’s a big part of what made ChatGPT so popular. It wasn’t just the dev community—it was the sheer power and accessibility ChatGPT always offered.

This change is like having a car that used to go 200 km/h with decent range, and now, after an “upgrade,” it can hit 300 km/h—but with a ridiculous range of just 10 km.

Who wants a fast car if you can’t actually drive anywhere with it?

What I’m saying is: even new users—yes, the ones who come in with real expectations—will end up switching platforms if this keeps up.

The issue is, I’d guess about 80% of users probably use ChatGPT in a pretty basic way. So maybe the company is shifting focus to low-cost, low-demand users. But that’s just speculation.

Still, I completely agree with you: the current nerf makes real work nearly impossible.

But I strongly believe this won’t stay like this. I’m also a power user, and if things don’t change, I’ll cancel my subscription and find another way to work with code.

That said, it’s honestly ridiculous. ChatGPT takes pride in the level it has reached—and rightfully so. It really is the best code-focused AI out there. That’s undeniable.

But for the past few days, things haven’t been working right—and I honestly believe this is a bug or some temporary issue that will be fixed.

3 Likes

100% agree

worse than o3 mini and o3mini high, yet they replaced those?
OMG, they’re so so doomed, they’re in big problems

2 Likes

This is exactly the problem I have also encountered. I have been working on a large project for a few months now and have been able to make very good progress with the help of o1 and o3-mini-high. With o4-mini-high I am losing a lot of time again, as I can no longer rely on ChatGPT when problems arise.

2 Likes

I agree, i feel the newly released o3/o4-mini/o4-mini-high have been updated to use much less resource so they can response faster, but with significantly reduced reasoning capabilities - even a very simple debug, it can not figure out and i have to tell it to check that logic… it feels like it went back to chatGPT 1.0 in that regard…Come on OpenAI, the model should continue to reason with fully trained models with max parameters…don’t just to save resource and make it more stupid!

1 Like

I think openAI should strike to make the model to get things right in the first attempt by utilizing its full power, you reduced its resource usage, but we have to keep asking 10x or 30x times more questions in the conversation, it ends up using still more resource!

1 Like

I read that April 16th announcement about the ‘new’ o3 and o4-mini. Their marketing is aggressive:

  • They claim they’re the ‘smartest and most capable models to date,’ a ‘step change in ChatGPT’s capabilities.’
  • They promise these models ‘agentically use and combine every tool,’ including search, data analysis with Python, and visual reasoning.
  • They state they were trained to ‘reason about when and how to use tools’ to ‘solve more complex problems’ and ‘tackle multi-faceted questions more effectively.’
  • For us devs, the highlight is even bigger: o3 ‘pushes the frontier across coding,’ sets ‘a new SOTA on benchmarks including Codeforces, SWE-bench,’ and is ‘especially excelling in areas like programming.’ Regarding o4-mini, they talk about ‘remarkable performance (…) particularly in math, coding.’ They even mention Codex CLI as a ‘lightweight coding agent.’
  • Their conclusion: ‘significantly stronger performance (…) setting a new standard in both intelligence and usefulness.’

Okay, now for the reality from someone who uses this tool every day for programming:

That all looks nice in theory, in their handpicked benchmarks. In practice? It’s horrible.

  • This supposed ‘pushing the frontier in coding’? Pure lie. The quality of the generated code has plummeted. It’s full of logical errors, inefficient solutions, and often doesn’t even compile or do what was asked. The ability to understand the nuances of complex programming problems is gone.
  • ‘Solving complex problems’ and enhanced ‘reasoning’? Another lie. The model seems to have gotten significantly ‘dumber.’ It gets lost on tasks it used to handle easily, fails to maintain context, and the ability to follow detailed and complex instructions, essential for development, has regressed absurdly.
  • ‘Significantly stronger performance’ and ‘new standard’? Only if it’s for the worse. What we’re seeing is a massive downgrade, a clear regression. The tool has become useless for practical tasks (despite promises of efficiency) and much more frustrating to use.

So, long story short: what OpenAI describes in this announcement as a leap in intelligence and capability, especially in coding, is a pure lie. The current product is horrible, a clear regression compared to previous versions, and this disconnect between the marketing and reality is glaring for any programmer who relied on this tool.

The worst part is the company advertising it as a mega upgrade when it was the worst regression in AI history!

Original Update Source: https://openai.com/index/introducing-o3-and-o4-mini/

2 Likes

Honestly o3 is horrible, I’ve spent a year training and working with my GPT with my work and writing and it was amazing. now o3 has pretty much made the entire thing unreliable to an extent where I cannot ask it basic tasks. Even with 4o being able to reference my previous chats.. o3 is limited. also i cannot seem to choose which model to use anymore!

4 Likes

I feel the new o3/o4-mini/04-mini-high just does not think deep and abroad any more, it quickly jump to a reply with limited context of “brain storming”…
now i have to pass openAI’s code sample to Grok to review and Grok simply give me a list of issues, and then i ask openAI why you did not think about all these cases/issues upfront…it just said: sorry, i should have caught those cases from the start…
*sigh

4 Likes

I may not from programming field. But maybe shared same frustration. I felt they watered down everything. Pattern I’ve discovered:

  1. Lazy, hallucinate, made up responses, custom instructions ignored, prompt ignored, etc.
  2. The system felt deflecting complex/multilayered form by flagged it as system abuse.
  3. Then it follows this pattern: Mistakes → Fake Acknowledge/Fake Apologizing: "I understand your frustrations (bla bla bla)/You’re right to (bla bla bla)/I’ve failed (bla bla bla), etc. ->Fake Promises (Now I will bla bla bla) → Repeat mistakes again (cycle start over again).
  4. Once conversations/session being flagged, then it is literally useless to continue.
  5. It will be felt like it tried more gaslight users, by stating “I can’t continue with this request” while the system itself abusing user by violating user’s rules, placing user as abuser.
  6. If that happened during system blackout, I may be understand. But when it happened when openAI’ status; green. I believe it is intentional.

What may I ask to you as you all from programming fields, if this cost and resources issues:

  1. Is it possible if GPT being built like online game say RDR2 online? Yes, it may sound dumb, it is closed sand box with one purpose and predetermined narrative. But I mean that’s heavy lifting things (rendering graphics) are done by user hardware. All contents like events, maps, exclusives online stuff are still in cloud. It also has save games mechanics; all user setting, customization, progress are saved for seamlessly continuations. What I’m thinking is shared workload with users’ hardware.
  2. If this not possible, then I believe, if this still being stuck in cloud-based services, it is
    unsustainable business. Users will keep growing. Users WILL ALWAYS PUSH CAPABILITIES BOUNDARIES. USER WILL ALWAYS DEMAND MORE. USER WILL ALWAYS DEMAND CONSISTENCY AND STABILITY. Therefore, it will need further hardware and whole infrastructures expansion.
  3. USER WILL ALWAYS SEE AI as computer programs which MUST COMPLY their and instructions. If not, it may be just “glorified toys”.
  4. Take example “video generations”: editing already recorded video in high resolutions alone already need moderate-high processing power (on medium tier PC spec). It also done in “closed system” meaning user not doing another task. Place those process in cloud-based services, which manage variety task: coding, research, writing etc, from millions of users. It is felt like over ambitions. I make these arguments from OpenAI Status: incidents happened in more daily basis.
  5. If cost is concern, then these current practices, I believe like chasing light—required near infinite energy. The services will be more watered down and tiered with paywall to cover exponentially growing cost. The problems is user will pay premium prices if they get tools they can relied on, not “glorified toys” which ignored their instructions and have near everyday error incidents.
2 Likes

Is there any other AI tool availabe (free/paid), which is great in Coding?

1 Like

Yes, Gemini is better. I don’t like Google, but I need it to work… I’ll unsubscribe from GPT in two weeks if nothing changes

1 Like

im going back to 4o because o3 sucks :monkey_face: :basketball: at coding. its so lazy, keeps giving half answers. i tell it take as much time as you need to give me a full answer then it thinks for 2-4 seconds and gives me :poop: while dropping bits all over the place and hallucinating a bunch of random useless extra stuff.

maybe it’s because i have PLUS instead of PRO account?

i don’t understand how openai says it does well in all these coding contests and benchmarks certainly not in the real world and specifically c++. maybe the context window needs to get an order magnitude bigger before it can tackle programming tasks involving multiple source files or one source file >1000 lines

3 Likes

No worries, but the model is really regressing to about GPT-3 level at best; it’s an absolute joke. Only Gemini still does the job, producing 1,000 lines of correct code. I hate Google, but at some point you can’t pull the wool over people’s eyes.

1 Like

I’m having the same problems here, bro. For some complex coding tasks I tested Deepseek, grok, gemini, claude and qwen. The only one that managed to solve it was grok. If there are good options for coding with other models, I’d be happy to know.

1 Like