o3 is a nightmare, it’s horrifying - after a long time being a happy customer with o1-pro and o1, I find myself swearing at the so called “smartest model”
it forgets even with 1000 token responses, it leaves things out, it forgets, keeps changing variable names, keeps breaking the code, HALLUCINATES LIKE CRAZY
GPT4 was much better than this in 2023, about 2 years ago.
I have been using structured output for a complex classification task and the system is completely broken now. Lot of validation errors, missing fields and sometimes a partial json. Such a stressful experience
I had times when chat gpt 3.5 mini fix my 3500 lines of code rewriting the entire code from start to end with no issues. Now its soo lazy that can only find your error and write up to 250 lines, but you have to manually update the code. I was using the gpt mostly to save time while working on other projects same time, but i will consider canceling my subscription.
I don’t know what’s going on, o3 became useless, it tried to fill .env file 4 times with wrong logins passwords, prompts were clear, it just was loosing context over and over again.
I’m encountering a serious issue with the O3 and O4 mini models: whenever I ask for complete code, they repeatedly omit large sections and key functionality, making them effectively unusable. This has disrupted my workflow, and if it can’t be resolved, I’ll unfortunately have to cancel my Pro subscription.
Off-topic a bit but what is everyone’s experience with Claude? Worse better? I’ve been using that for writing all my marketing and sales messaging. Far better imo. (for context. i do sales in tech. zero code.) I was actually having luck with o3 for admin tasks (like a mini AI Agent almost) and was getting what I needed done. o4-mini and o4-mini-high have been painfully dreadful. It barely thinks. 4.5 has been incredible for writing but the limits are insanely small (same with o3) I can barely get much done with that. if it had the same limits as 4o that would be incredible.
I like claude for technical stuff but it isn’t great at playing out a persona or character and has trouble staying in character after long conversations
Yeah I miss o1 so much the o3 is suck I swear o1 give me a long context 8000word-10000 word sometime and the o3 is suck they only give me a short of word
I will be writing to request a refund for the ChatGPT PLUS subscription, because models o3 and o4 are completely useless for coding. The o1 model, which I used to write my Program in November, December, and January, worked perfectly — but the current OpenAI models are terrible. o3 and o4 are not even capable of rewriting, let alone creating, 488 lines of code, which was effortless for o1.
Additionally, the models do not follow instructions and are simply disastrously weak compared to o1. This is a massive downgrade — a true devolution of the GPT models. Where can I request a refund? These o3 and o4 models are worthless — after 3 hours of back and forth, they can’t even consistently repeat code. It’s a joke or a bad prank."
So it’s almost a month since o3 is out. Pro plan is completely useless now. Model is so weak that unable even to give a simple instructions from manuals. I guess it’s time to say goodbye.
And btw it’s not only about coding. I gave it a simple text with arounds 500 symbols, it took 4-5 iterations to get proper translation because everytime it was distorting the meaning. Why I should pay 200$ and stress myself?
I also canceled my paid plan after these changes.
It seems that they only care about scoring in benchmarks (whatever that means :S) and in practice they deliver garbage to users.
I’m struggling with gemini for complex tasks, which o1 did much better.
The only way we can put pressure on openAI is to cancel the subscriptions, unfortunately.
I’m normally not one to chime in, but yes, the newest model complement from OpenAI is just bad. It’s like "sparse attention, flash attention? No, let’s try “guess” attention. Simply no brains to talk to over there.
Example interaction received from the “best reasoning model”:
Q: what is wrong with this API call, where API_KEY is just a placeholder for the actual hard-coded API key? Compare to the other working example.
A: The problem is you are not using a valid API key. Here’s the example repeated back at you.
BTW, not this topic’s models, but more: using ChatGPT to simply reformat someone’s message here (proposing how to use Responses API endpoints’ server-side stateful methods to retrieve prior input), gpt-4o butts in with disobedience and offers “help” completely beyond its faculties.