Right now im experiencing very terrible experience with the newest models o3 and o4 they dont follow instructions they are lazy instead of writing the full code they just tell u (put this here or this part is unchanged or paste it here ) o3-mini-high was extremly efficient it could handle even 5000 lines of codes without any issue now these models struggle with just 400 500 lines of codes i paid 200 dollars because im a programmer and i need the full access to the models now i completly wasted my 200 dollars as the answers feels coming from gpt-4 turbo or gpt 3.5 completly useless very very very disappointed
Youâre totally right, man â no argument there. Things are a mess. The model might be more intelligent according to the benchmarks, but the amount of context you can load into it got cut so much that, even if it gets the idea, it just canât produce a remotely accurate output. Itâs like running a function with most of the required inputs missing â the logic might be there, but the result is consistently off.
Completely agreed, if there is any solution please do share. Thanks
it only works when you are extremly aggressive in the promps and also customize instructions to tell him to always follow instructions never use omissions and always give me full codes and relevant answers to the promps, prioritize accuracy over speed and tokens. but the secret is to be extremly aggressive with ure promps but still doesnt work all the time right now chatgpt is completly useless for coding they did this to reduce cost and they will end up losing all their programmers community wich are the most loyal users and frequently payers i myself pay the 200 dollars subscribtion if this is not fixed i wont be paying it anymore and cancel my subscribtion
I agree with you 1000%.
These days, anyone could become a programmerâor at least they could. All it took was getting ChatGPT Plus, and if you really wanted to push the limits, youâd go for the Pro plan.
Thatâs a big part of what made ChatGPT so popular. It wasnât just the dev communityâit was the sheer power and accessibility ChatGPT always offered.
This change is like having a car that used to go 200 km/h with decent range, and now, after an âupgrade,â it can hit 300 km/hâbut with a ridiculous range of just 10 km.
Who wants a fast car if you canât actually drive anywhere with it?
What Iâm saying is: even new usersâyes, the ones who come in with real expectationsâwill end up switching platforms if this keeps up.
The issue is, Iâd guess about 80% of users probably use ChatGPT in a pretty basic way. So maybe the company is shifting focus to low-cost, low-demand users. But thatâs just speculation.
Still, I completely agree with you: the current nerf makes real work nearly impossible.
But I strongly believe this wonât stay like this. Iâm also a power user, and if things donât change, Iâll cancel my subscription and find another way to work with code.
That said, itâs honestly ridiculous. ChatGPT takes pride in the level it has reachedâand rightfully so. It really is the best code-focused AI out there. Thatâs undeniable.
But for the past few days, things havenât been working rightâand I honestly believe this is a bug or some temporary issue that will be fixed.
100% agree
worse than o3 mini and o3mini high, yet they replaced those?
OMG, theyâre so so doomed, theyâre in big problems
This is exactly the problem I have also encountered. I have been working on a large project for a few months now and have been able to make very good progress with the help of o1 and o3-mini-high. With o4-mini-high I am losing a lot of time again, as I can no longer rely on ChatGPT when problems arise.
I agree, i feel the newly released o3/o4-mini/o4-mini-high have been updated to use much less resource so they can response faster, but with significantly reduced reasoning capabilities - even a very simple debug, it can not figure out and i have to tell it to check that logic⌠it feels like it went back to chatGPT 1.0 in that regardâŚCome on OpenAI, the model should continue to reason with fully trained models with max parametersâŚdonât just to save resource and make it more stupid!
I think openAI should strike to make the model to get things right in the first attempt by utilizing its full power, you reduced its resource usage, but we have to keep asking 10x or 30x times more questions in the conversation, it ends up using still more resource!
I read that April 16th announcement about the ânewâ o3 and o4-mini. Their marketing is aggressive:
- They claim theyâre the âsmartest and most capable models to date,â a âstep change in ChatGPTâs capabilities.â
- They promise these models âagentically use and combine every tool,â including search, data analysis with Python, and visual reasoning.
- They state they were trained to âreason about when and how to use toolsâ to âsolve more complex problemsâ and âtackle multi-faceted questions more effectively.â
- For us devs, the highlight is even bigger: o3 âpushes the frontier across coding,â sets âa new SOTA on benchmarks including Codeforces, SWE-bench,â and is âespecially excelling in areas like programming.â Regarding o4-mini, they talk about âremarkable performance (âŚ) particularly in math, coding.â They even mention Codex CLI as a âlightweight coding agent.â
- Their conclusion: âsignificantly stronger performance (âŚ) setting a new standard in both intelligence and usefulness.â
Okay, now for the reality from someone who uses this tool every day for programming:
That all looks nice in theory, in their handpicked benchmarks. In practice? Itâs horrible.
- This supposed âpushing the frontier in codingâ? Pure lie. The quality of the generated code has plummeted. Itâs full of logical errors, inefficient solutions, and often doesnât even compile or do what was asked. The ability to understand the nuances of complex programming problems is gone.
- âSolving complex problemsâ and enhanced âreasoningâ? Another lie. The model seems to have gotten significantly âdumber.â It gets lost on tasks it used to handle easily, fails to maintain context, and the ability to follow detailed and complex instructions, essential for development, has regressed absurdly.
- âSignificantly stronger performanceâ and ânew standardâ? Only if itâs for the worse. What weâre seeing is a massive downgrade, a clear regression. The tool has become useless for practical tasks (despite promises of efficiency) and much more frustrating to use.
So, long story short: what OpenAI describes in this announcement as a leap in intelligence and capability, especially in coding, is a pure lie. The current product is horrible, a clear regression compared to previous versions, and this disconnect between the marketing and reality is glaring for any programmer who relied on this tool.
The worst part is the company advertising it as a mega upgrade when it was the worst regression in AI history!
Original Update Source: https://openai.com/index/introducing-o3-and-o4-mini/
Honestly o3 is horrible, Iâve spent a year training and working with my GPT with my work and writing and it was amazing. now o3 has pretty much made the entire thing unreliable to an extent where I cannot ask it basic tasks. Even with 4o being able to reference my previous chats.. o3 is limited. also i cannot seem to choose which model to use anymore!
I feel the new o3/o4-mini/04-mini-high just does not think deep and abroad any more, it quickly jump to a reply with limited context of âbrain stormingââŚ
now i have to pass openAIâs code sample to Grok to review and Grok simply give me a list of issues, and then i ask openAI why you did not think about all these cases/issues upfrontâŚit just said: sorry, i should have caught those cases from the startâŚ
*sigh
I may not from programming field. But maybe shared same frustration. I felt they watered down everything. Pattern Iâve discovered:
- Lazy, hallucinate, made up responses, custom instructions ignored, prompt ignored, etc.
- The system felt deflecting complex/multilayered form by flagged it as system abuse.
- Then it follows this pattern: Mistakes â Fake Acknowledge/Fake Apologizing: "I understand your frustrations (bla bla bla)/Youâre right to (bla bla bla)/Iâve failed (bla bla bla), etc. ->Fake Promises (Now I will bla bla bla) â Repeat mistakes again (cycle start over again).
- Once conversations/session being flagged, then it is literally useless to continue.
- It will be felt like it tried more gaslight users, by stating âI canât continue with this requestâ while the system itself abusing user by violating userâs rules, placing user as abuser.
- If that happened during system blackout, I may be understand. But when it happened when openAIâ status; green. I believe it is intentional.
What may I ask to you as you all from programming fields, if this cost and resources issues:
- Is it possible if GPT being built like online game say RDR2 online? Yes, it may sound dumb, it is closed sand box with one purpose and predetermined narrative. But I mean thatâs heavy lifting things (rendering graphics) are done by user hardware. All contents like events, maps, exclusives online stuff are still in cloud. It also has save games mechanics; all user setting, customization, progress are saved for seamlessly continuations. What Iâm thinking is shared workload with usersâ hardware.
- If this not possible, then I believe, if this still being stuck in cloud-based services, it is
unsustainable business. Users will keep growing. Users WILL ALWAYS PUSH CAPABILITIES BOUNDARIES. USER WILL ALWAYS DEMAND MORE. USER WILL ALWAYS DEMAND CONSISTENCY AND STABILITY. Therefore, it will need further hardware and whole infrastructures expansion. - USER WILL ALWAYS SEE AI as computer programs which MUST COMPLY their and instructions. If not, it may be just âglorified toysâ.
- Take example âvideo generationsâ: editing already recorded video in high resolutions alone already need moderate-high processing power (on medium tier PC spec). It also done in âclosed systemâ meaning user not doing another task. Place those process in cloud-based services, which manage variety task: coding, research, writing etc, from millions of users. It is felt like over ambitions. I make these arguments from OpenAI Status: incidents happened in more daily basis.
- If cost is concern, then these current practices, I believe like chasing lightârequired near infinite energy. The services will be more watered down and tiered with paywall to cover exponentially growing cost. The problems is user will pay premium prices if they get tools they can relied on, not âglorified toysâ which ignored their instructions and have near everyday error incidents.
Is there any other AI tool availabe (free/paid), which is great in Coding?
Yes, Gemini is better. I donât like Google, but I need it to work⌠Iâll unsubscribe from GPT in two weeks if nothing changes
im going back to 4o because o3 sucks
at coding. its so lazy, keeps giving half answers. i tell it take as much time as you need to give me a full answer then it thinks for 2-4 seconds and gives me
while dropping bits all over the place and hallucinating a bunch of random useless extra stuff.
maybe itâs because i have PLUS instead of PRO account?
i donât understand how openai says it does well in all these coding contests and benchmarks certainly not in the real world and specifically c++. maybe the context window needs to get an order magnitude bigger before it can tackle programming tasks involving multiple source files or one source file >1000 lines
No worries, but the model is really regressing to about GPT-3 level at best; itâs an absolute joke. Only Gemini still does the job, producing 1,000 lines of correct code. I hate Google, but at some point you canât pull the wool over peopleâs eyes.
Iâm having the same problems here, bro. For some complex coding tasks I tested Deepseek, grok, gemini, claude and qwen. The only one that managed to solve it was grok. If there are good options for coding with other models, Iâd be happy to know.