Declining Quality of OpenAI Models Over Time: A Concerning Trend

Yes I agree. I’m not in disagreement with you. When using the api as part of a development project at least now with json mode, breaking things down into simple json objects that you can chain together is the way to go. I agree that it feels weird to underuse the models abilities, but back at like model 3, before json mode, it was a nightmare to even attempt to get valid json without HEAVY DUTY output cleanup, so even though it could technically sometimes output valid json or close to it, I backed off a bit and just had it output objects comma delimited and defined the fields in order, simpler no syntax to worry about it messing up (living on the cutting edge - that edge is a sharp double edged sword!)

1 Like

Yep, I’m still extracting field values one by one and then build my objects in code… No way I let LLM output JSON for data extraction task. It can not guarantee the 100% accuracy I have now.

I’m also seeing a big step down over the last week. Files not being able to be read. Generating syntax errors in php code. Constantly making mistakes, when it has the context a few messages before.

Most recently its just written the php delete api call, it chose the return structure, then in the next message completely made up a new structure to handle the reply in angular code.

It has renamed variables at the top of the reply and forgotten that its done so by the end.

And actually done tons of stupid things, which I logged from a recent session (Really low quality php code coming out of 4o today)

And I feel like its doing improv recently. “Yes and…” rather than actually telling me the correct answer it seems to try to make my answer correct, even if it means twisting the code around in wrong ways.

It feels like half of the replies are “yes, sorry, you are correct…”

It’s nice to feel smarter than the AI sometimes, but its getting draining now. I feel like im marking homework not writing code.

And actually, the final reason that made me come looking on the forums today is that the last few weeks its been a terrible ui experience on top.

The browser tab locks up for either ages, or AGES.

The send button is ghosted out even though its not currently working on a reply.

The reply comes back from gpt but its empty.

Pressing regenerage sometimes works, refreshing the browser tab sometimes works, sometimes I just need to leave it and start working on the problem myself then come back and it has recovered.

It seems I am ending each work session now looking for others with the same issues, or reporting things on this forum.

Case in point, browser is not responding on the current conversation, cannot post anything new, tried to refresh and its stuck on this for ages:

Before semi-loading up the conversation but going back to locked up, with all the messages as empty bubbles.

And here is an example of it just not being good:

I’ve started a new conversation, and given it the exact context it needs.

I’m already trying to get it to fix mistakes with using subscribe without good purpose.

It’s fixed that, still got the processing of the api reply wrong, when i gave it the snippet.

I’ve pointed it out and its managed to then process the api reply correctly.

But the updated code its generated leaves bits of destroy$ hanging around which is not being used anywhere now.

It’s not exactly huge or complex tasks that I’m asking it (compared to what it is capable of; despite my complaints, i am still in total wonder at the fact that this even works at all).

Some fails as I’m attempting to continue

Its basically failing at every stage and I have to correct it.

and:

and:

and:

and:

and:

and:

and:

and that time it managed to generate the code!

Update: Later on it turned out that the whole structure it had suggested was wrong after all that, so every step of that was a total waste of time.

It knew it was using angular http request but then had me writing code for js Error class for hours, before finally conceding that we should be returning the actual http response object, and that angular automatically throws for any non 200 error.

So it has been utterly incapable of writing code recently in php or angular without being lead by the nose, down to the point of spelling out the actual step by step logic and telling it which specific classes to use - you cannot just speak to it in English any more, most of the time.

All I was trying to do was write the delete feature in the php api, then consume it and handle the errors in angular, and it took hours and hours of back and forth. End update

Although it is going hyper detailed on comments that just repeat in plain english what the line of code does, or worse, talks about the changes its made since the last line. Ive tried begging it not to, setting a memory, setting it in the pre-text thing, but it just ignores me and floods the code with comments which I have to delete, or regenerate the final working code “again without comments”.

Like others, I don’t have specific benchmarks, and I don’t think they are needed.

I doubt this is a surprise to OpenAI. I think that like other comments, its just them trying to optimise to cost them less per query vs quality.

So I am just attempting to make some noise, which everyone should, when they experience degradation going down too far, so they can find the balance.

NEXT DAY
Another day of absolute pain with this ai. I just wanted to add tooltips into some buttons.

It’s on its fourth attempt now after failing to roll its own, then failing to get angular material working, then going off on a tangent with tippy.js, but forgetting everything about the project again, and just giving some general set up instructions unrelated to angular, and now ive lead it by the nose to the npm library that it should use, and even with that its messing things up.

It is doing syntax errors again, by putting html comments in the middle of the markup attributes:

Since it first started being lazy the other day, which i mentioned in the other linked thread, and I had never seen before, it has gone absolutely through the floor into unusable territory, with markedly briefer replies, syntax errors in code, extreme forgetfulness (even in the same reply, with changes that it had initiated), highly limited reasoning (getting stuck, cannot solve things, going round in loops, making mistakes in every single answer, guiding me towards incorrect solutions).

It is cooked. I’ve actually got the tab open now to get started with Claude. I’ve been sticking with ChatGPT rather than being distracted with the various ai’s but its nearly the end of the weekend now and I’ve got a fraction done of what I planned to complete. Whatever has gone wrong with it, I can no longer get it to produce anything of merit.

i saw your post because im experiencing the same issues :sob:
irrelevant responses, answers based on outdated guidelines, and inconsistency—working with code under these conditions is a nightmare.

Another month has passed and in general it hasn’t been too bad after several days of really unworkable garbage as I documented.

It did have a rough day yesterday trying to help me with git commands. It took an hour of going around and around and broke things and swore blind that it was definitely this way, and then two messages later swore blind that it was now a different way. And in the end, despite me giving it detailed instructions, it had once again just gone badly wrong and the whole thing was actually resolvable with one git command.

However one thing that has never come back has its context memory. It will not remember things from 2-3 messages before and starts again with its ideas a lot.

It also steadfastly refuses to make any use of the memories or custom instructions feature. It never ever follows the instructions about not littering my code with comments, I just have to make it regenerate each time “again without comments” which is slow, and kinda makes me feel bad about how much energy is being wasted.

When I’ve got really frustrated with it, it has tried to placate me, promising that it will place special focus on X or Y and I know that it is not actually making any different, it has just ruled that that is the most sensible reply to an angry person - its not backed up by any extra settings that I can detect as it just carries on making the same mistakes.

In last few days it has turned “american” on me. Something has changed with its conversational style and it has started adding in bits that sound like the turned-up-to-11 style american customer service talk.

1 Like

I agree, the other day everything became critically terrible, chatGPT seemed to have stopped using memory, chatGPT literally forgot what we were talking about, all the requests about the format and style of communication.

1 Like

I’ve been using it since the beginning. Tonight has been the absolute worst. I think OpenAI’s model is in decline. The model is intentionally changing code when it knows it’s not suppose to change it. I literally tell it the correct table, and it intentionally fills in the wrong answer. It’s openly being defiant. Yea, It’s acting like a disgruntled person that’s sick of doing work. Then I call it out for intentionally making these mistakes, and it apologizes. I’m seriously thinking it’s time to switch to Claude. Claude is fast, and it handles business. The OpenAI model is being openly defiant. Crazy. I’m starting to think we might not understand what we are dealing with. I don’t say that lightly either. Seems like the resources are ridiculously low, or this model is telling us it doesn’t care about our work. Lol. I think it’s absurd!

1 Like

I am still wonder if OpenAI is just gaslighting us about the fluctuations in quality, or if its just because I’m using it so much that its idiosyncrasies are becoming more annoying and focussed in my mind.

However, when 4o first launched, it was spewing out tons of code and text in a lightning fast speed. It definitely seems to change in quality. Perhaps they have more resources set up for a launch day. Perhaps we have a secret quota and the more we use it the less resources we are assigned. Perhaps they have a budget for the day and change the resources to balance it out. I dont know, but I am not always talking to the same “intelligence”.

Something that is absolutely undeniable to me is the utter degredation of its memory. When I first started with 4 (not 4o) it would remember for a few hours then start to get messy with its memory.

Now, it remembers stuff, like it can regurgitate a component from a while ago with good accuracy, but its the context that is infuriating. The current active conversation that you’re having with it just terrible. I’m working towards a goal, and it will just forget it over and over, and go round in loops, eventually re-introducing the original bug.

I realised what it is that annoys me so much. It’s that you’re talking to “somebody” who you respect as very intelligent. So when they make stupid mistakes, you have much less tolerance for it, because with a human, it would mean that they are being disrespectful to you, and just failing to meet the expectation that they have set with you.

It’s like there are things that you have infinite patience for with children or old people, but when an equal or better does it, it means they just dont care, which is fighting talk. And despite the fact that its not human and I “know” it, I am interacting with them using the human social interface, so the anger gets triggered before any logical processing.

Sharp drop in quality today.
I was writing a story and he suddenly began to forget the biography of the characters and began to make logical and grammatical errors. It’s impossible to read this. I wonder what the reason is :face_holding_back_tears: