ChatGPT (o3, o4-mini-high and even o1-pro) sucks now

OpenAI folks must be monitoring this. Why is no one from their team provide some sort of acknowledgement and next steps on these reported issues. They report massive improvements but we are seeing just the opposite.

4 Likes

These models are not just lazy, they are broken.

I have numerous chats with o1-mini, showing how utterly efficient and hard working that model was.

Its not just broken. Its unusable. The best action would be to allow users to use prior models.

o3, o4-mini, both are just as useless garbage as o1-preview was.

Its cheap. I know how to get these models to work efficiently, regardless.

I don’t press enter i press a macro. “Fn+F” = “Please provide a full script” + enter. Doing this regardless of prompt. Give this to o3 vs Gemini, vs Claude. Each of them will produce 4x more on a single response. Gemini, will write so much it can fix it, up to 1,400 lines of code putting it at levels that o1 mini would reach. Claude will strongly resemble o1. o3 on the other hand will give me a cpp script that is utterly broken. Existing code? pssh you either spend 24 hours writing it with them (when it would take 1-2 hours before) Or? You do not go to openai.

A dream would be the patch we need,
O3 remains
O4-mini high gets replaced with o1-mini.
o4-mini gets replaced with o3 mini.

As it is? If you want to code chatgpt 4o is the way to go on the platfrom. Gpt 4o will solve the problem FASTER because gpt 4o listens to the user (Due to far better training data)

If you really want to use these models? You will have to reach out to an English teacher for the prompts.

3 Likes

Oh wow just wow these new models and what they have done to o1 Pro is so so bad. I have been trying to carry out what would be low level coding normally and it just cannot do it any of the models let alone the time it takes to respond which is up to 3-5 minutes per response then the errors and crashes.

So I give it a database name, a database location, the table, the fields direct from the database, the html front end code and and the flask back end code to connect it.

It returns the Wrong database Name, the wrong location, the wrong fields between the html and the flask so it’s rendered useless and then when I tell it what it did wrong it gives me the same answer all over again or it errors out.

That is on o1 Pro and the 4o and 3o models they are useless.

They have built themself a leonardo.ai server for making images well done team must be proud.

I’m super angry I have work to do and they roll out this rubbish and praise it as look this is going to be the next best thing. You got it wrong roll it back and roll it back quick cause I’m paying a premium for your service and getting the scraps left on the street

2 Likes

You know what’s worse? I ran tests yesterday, and DeepSeek itself — which is FREE — delivers better results (COMPLETE AND FUNCTIONAL) than all the current ChatGPT models (O4 Mini / O4 Mini High / O3).

I posted the details in another thread. There were 4 failed attempts with ChatGPT and 1 SUCCESSFUL attempt with DEEPSEEK (FREE CHINESE MODEL).

Note: The test involved 532 lines of code — very simple ones.

Man, this is an unprecedented embarrassment.

2 Likes

I’m also having the same problems, I noticed that o3-mini-high disappeared sometime last week, since then the models have become stupid, making silly errors.
I’m a pro user paying $200 pm for these things, it’s becoming unusable for anything more that 200 lines.
I’m seriously considering moving on to better alternatives.

2 Likes

My thoughts exactly. Where is OpenAI? I am now trying alternatives. They are missing the boat. Many will not switch back. I switched to ATT when it was Singular, I’m sure Sprint wishes it never chased me away. Maybe they would still exist if they had loyal customers like me.

And what is the deal with none of these models being able to tell time? They are always at least .5 hour off.

1 Like

So are you saying 4.0 is better than Gemini or Claude? Why is everybody freaking out about this technology. It always makes mistakes and if I was not educated on the subject matter I may not catch them. But I need a better alternative. @estudante_picasso seems to think that Deepseek is better.

Hello my friend, good afternoon!

That’s not exactly what I meant — I mean, I’ve always considered ChatGPT better (before this new update).

With the current version, due to numerous issues with code (for example, a file with 532 lines), I decided to test DeepSeek since Gemini Pro and Claude were mentioned.

Because of that, I decided to try DeepSeek to see what the results would be like, and surprisingly, while I couldn’t get even one accurate response from ChatGPT (after four attempts), I got a correct answer on the very first try with DeepSeek.

I presented the conversation from ChatGPT and the final file generated by DeepSeek just for demonstration purposes.

So, this isn’t a personal opinion — it’s a fact, unfortunately.

My opinion? ChatGPT has always been superior, especially due to the amount of information you can input. But now, since the new update, it’s been terrible, unfortunately.

If you want to check out the results, I posted them in another thread where I was replying to someone else:

“O3 and o4-mini are lazy. Not suitable for coding at all”
You can search for it and take a look — I shared the ChatGPT conversation (which contains the errors), and also the fully functional, perfect file from DeepSeek (although I wasn’t able to share their chat log).

The result is 100% real and honest — it’s a fact, not just a personal opinion.
I’m a true fan of ChatGPT, honestly, and I have a lot to be thankful for.

Seeing the same performance issues on my end as well. It takes significantly more nudging to get the desired outcome relative to before the update. It seems to be taking the “lazy route”. Quite disappointing as a Pro subscriber, given I am paying a premium to get more compute and context per conversation/prompt. I hope they take this feedback and incorporate it. I am considering other options at the moment as well.

1 Like

I have done this too. Its best to do it very late at night on a Friday to rule out server issues.

Deepseek actual passed my “Tsar Prompta”.

1477 lines, 52,587 char in total. This prompt contains two variants of the same prompt daisy chained with minor differences. The goal was to make a script with heavy emphasis on fine tuning it with mathematics and a multi step process.

Deepseek finished first, because deep seek did not overcomplicate it.

Further tests proved a similar pattern.

Deepseek is best for tasks ranging in 100-250 lines of code.

Claude dominates around 250-500 lines of code

Gemini has no competition in the 500 lines+ range, as Gemini can output 1500 lines of non gui code easily.

Since releasing image generation, none of openai’s models have been able to compete.

1 Like

EXTREMELY broken right now. I was used to giving it 1.7k+ lines of code and asking it to make changes and give me the entire code back.

Now it can only give me tiny snippets telling me to integrate these tiny snippets into my already unorganized code. I’m not a developer so i relied heavily on ChatGPT to develop my own model training app & specialized website very easily and fast. With the latest changes my productivity has been destroyed. I tried the $200 dollar o1 Pro and it’s also been nerfed heavily in the same way as the O3’s and O4’s have.

I’ve trialed Deepseek and it’s similarly nerfed although it’s free and ever so slightly better with instructing me how to integrate snippets… I have to try Gemini ando thers to try and find a useable Model now…

They are basically trying to save on compute is what i understand. Well unfortunately a lot of people are about to cancel their subscription. I hope this works out for OpenAI although in my opinion i think it wont.

RIP

6 Likes

Totally agree. I am even seeing these regressions in the API.

1 Like

I totally agree with the general sentiment here. I was very frustrated on the very first day of using it. I am doing mini apps and scripts to automate my relative’s company business. Now I can barely do anything productive. Now I mostly spend the whole day fighting the prompt without getting anything useful. I will only continue to pay this month because my girlfriend is used to studying with GPT audio conversation mode, otherwise, it’s useless in this state, it’s the same as before for me, jumping from one model to another. Want to insert new models? Great! But don’t remove some stable models without general approval from users. o3 mini high was very good.

ChatGPT is focused on money.

In other words, compare it to other competitors.

The price of ChatGPT consumption itself is much higher. Therefore, it makes sense that the amount we can consume is also lower. Right?

They just adjusted it; if you have medium/high demand, do it via API where the price is exact.

What they didn’t think about is that if you’re going to use it via API, it’s better to use DeepSeek, which is 10x cheaper with equal or better results.

A big shot in the foot, a stupid company, honestly.

2 Likes

100% agreed, ChatGPT has devolved into an instruction-blind, unreliable mess—nearly useless and deeply disappointing regression.

3 Likes

The problem is, it sucks via API also…

2 Likes

you are completly absolutly right

1 Like

I’ve been having better luck with o4-mini-high today using this prompt:

You’re a deterministic coding agent
topics: programming language, frameworks, modules, etc.

here’s a stub test:
// build out all dependencies
// create an instance of the thing to test with the dependencies
// create one example test
// add 'ChatGPT use this type definition:
// how to make something it may need

here’s the type to test:
// full implementation

create tests that cover all code forks

Its still not following instructions like o3-mini-high did, I’ve tried adding things like:
‘if you don’t follow my instructions this conversation will be terminated’ and it doesn’t care but I can prompt it again to add what it missed and its been kind of like o3-mini-high.

Maybe there’s hope. I canceled my subscription and stopped using the API and started getting up to speed on Claude but that really doesn’t like listening to instructions at all and kept omitting stuff.

If o4-mini-high keeps up I’ll re-subscribe, I’m sure they’ll have no problem taking my $ again.

1 Like

I think they’re trying to taking us towards chatgpt pro, plus subscription its like a free trial in order chatgpt’s o models became a necessity for everyone.

I believe this is soo expensive to sustain energetically and use of GPUs that the free trial is ending for all of us now, the price will start to raise soon.

I noted all the models are less “smart” lately.

I see a lot of bashing of the models. Perhaps some models are better in some domains while worse off in another domain ? To my knowledge, and don’t quote me on this, o3-mini has better reasoning , same price rate for token utilisation as o1-mini. 4o is for image generation and o1-mini is better than 3.5 Turbo because of COT. Go compare the models and read… There are different strengths to different models. Where they are stronger on aspect, then they do not perform so well in another. That’s my current understanding and experience when working with the models.

So you perhaps the right way is to pick and choose and call API to whatever is stronger in the respective domain ! That’s what developing Algebraic Equation GPT4 taught me.

I am not siding OpenAI, I do feel their pricing is abit steep. but the exchange here seems to just compare that their models are getting worse in all aspect and saying that things are going backwards. If every model excels the previous in all aspect. Then logically speaking, the right approach is to deprecate the current model when a new comes about. But NO, I think some models are stronger than others in a specific domain. So maybe it’s more fair to compare them domain wise, rather than version wise.

Just my 2 cents worth.