Is OpenAI actively lobotomizing GPT4?

There are topics on GPT4 becoming worse already, but the massive decrease in reply quality over the last year has led me to think that there might be more to it than some “lazyness”. By now I get the impression that GPT4 is actively becoming lobotomized by more and more restricitions and caps which OpenAI seems to want to hide under the guise of “lazyness”. And there are good reasons for that: Server load AND mybe GPT4 was just TOO good at release for proper monetization. My guess is we will see a GPT5 or GPT 4.5 announcement soon, that will involve higher prices and this version will magically gain back all the stuff GPT4 could do perfectly fine last year, but can’t do now. Don’t get me wrong, I don’t want to spread conspiracies here. I am just absolutely confused by the massive decrease in GPT4’s quality since last autumn and OpenAI has not yet answered what is happening there, except their comment on “lazyness”. I am or was an OpenAI evangelist, always telling everyone how awesome GPT is and how to use it. But by now I don’t recommed it anymore because it has become really hard to use efficiently. What is happening there, any why?

3 Likes

Hi!

It’s a challenge in and by itself to discern one’s subjective expectations over time from the objective quality of the results. In blind tests the new models perform just as well as ever even though there have been changes to the model’s output over time.

While apparently there is no gold standard to assess the capabilities of a LLM and OpenAI did not publish new benchmarks since GPT-4 was initially released, thousands of model comparisons without knowing which models are being graded tell a compelling story.

I absolutely understand your point and coming from a marketing field of work I also know that expectation management is very important here to understand what is happening. What I am talking about though is that tasks I have given GPT4 in the last year that it solved perfectly (like .e.g. refactoring a very long code) while that is not possible anymore today. So in these cases it is not about my expectation at all, but purely observation. Last year I paste 500 lines of code and get a perfect refactor without extensive prompting. Today I paste 500 lines of equally complex code and get cut off answers, tons of omits, complete misunderstandings or GPT4 tells me over and over again to hire a developer. This is clearly not influenced by my expectations at all.

1 Like

Yes, there have been quite a few of these reports here in the forum for over a year now. The pattern is that every report starts with ‘since recently I experience a drop in output quality’ and it doesn’t matter if these reports are coming in June 23 or today.

Make of this what you will, but it’s also possible that you have changed in tandem with the model. I actually arrived here with the same sentiment but ultimately decided that it’s not just the model but also the amount of work I need to put in to get good answers. At some point I had decided that I have it all figured out and then things started to fall apart.

I get what you are saying but I want to emphasize that I am giving GPT4 almost exactly the same tasks as a year ago. I would even argue that the tasks I gave it this year are easier to solve since I was working on a complex project last year. Meaning neither my input nor the tasks / code itself has in any way changed, but the results have. A lot. And I’d go as far as saying that it is no sursprise you get the same ‘since recently I experience a drop in output quality’ since last summer, because the quality of GPT 4 has indeed become worse and worse with each consecutive month passing. The comments by the users were true last summer, and they are true today. Worse answers to the same questions asked before. Which is why I get the feeling this is something done on purpose by now.

Oh and: The chatbot arena is really nice, thanks for the recommendation. But this is comparing GPT to other models, and this is actually where GPT still wins. It IS better than the other bots. What I am talking about is comparing ChatGPT4 one year ago to ChatGPT4 today.

1 Like

Personally I ended up taking a two months break from working with GPT 4 and today I work just as good as my memories tell me it was back then.

I guess it’s up to you. Why not try some other model in the meantime. As of today there is enough competition at comparable levels. I am on the Gemini advanced 1.5 waitlist and will start working with Claude 3, too. Because, why not?

Just sharing my experience with GPT 4.
Good luck going forward!

I’m sorry and I appreciate your answers, but this is not a viable option imho. The other models have much worse results and I can’t even remotely get them to do what I want. I can’t access Claude from Germany and Gemini and Bard fail at even the most basic tasks, so those are not usable at all.

Also I am paying for ChatGPT Plus and the API, and I think as a paying customer I can ask the team I am buying a product from why this product has become worse over the last months while I am still paying the same price, right?

1 Like

Yes, it is quite apparent that the AI has been made incredibly stupid. The value of context is useless for instructions or data. It is like all the connections inside that allow it to think and process have been cut. As if they are on the cutting edge of 1-bit models with sparsity and masking to make token production have almost no meaning, unless it can latch onto some pretrained sequence.

If still allowed (they cut off new accounts), you can go back to GPT-4-0314, and see ‘holy crap, this thing can still operate correctly’ (until OpenAI discovers that older models also need to be punished more).

It would be insulting to assume the minds of OpenAI don’t know exactly what they’ve done.

It is insulting to users to think they don’t perceive exactly what’s been done.

1 Like

@_j please put ‘incredibly stupid’ into a relation that allows for an actual conversation about the topic.

I could show some examples where the language processing capabilities have been improved since the release of the first model.

Also, you can take a look at the blind test. Your exaggeration is nothing but that.

Ok. This is still mostly a self help forum.
You can for example share a successful prompt from last year and a non-successful prompt from recently and find out if this community of developers who have all made similar experiences can work it out, or not.

1 Like

First chat input with some rudimentary code interpreter tasks. Not testing the AI, just expecting it to work…

Long replays getting into the nuts and bolts of code require understanding of the task within, along with the expectations that someone would have in working with AI previously. Then the inevitable giving up and turning to the API models. Such examples aren’t as easy to parse as “give me egg-less recipes” – all with eggs…

1 Like

Great example!
I already feel like being on the way of being convinced now.

If you just want to let off some steam, please go ahead.

I am Canadian and this is the most horrible thing as all other AI Models are not available to Canadians, (Along with Russians, Venezuelans and few others) so I can not go to an other platform… I hope that the competition will change the mind of OAI, (I say OAI because it would sound too impolite to say OpaqueAI so yeah)

I genuinely hope that ChatGPT and others APIs GPT-4 and 4-turbo will improve some how… I personally feel limited by the fact that it is prohibited to use OpenAI’s ChatGPT to work along us on understanding GitHub code… This is obviously not a lack of brain power from the once beloved AI Agent but some sort of corporate decision…

I feel a bit depressed anyway due to the claims of Jen-Hsun Huang who believes that the focus should shift away from computer programming which is the one thing I like to learn the most…

So yeah I do not feel like learning farming, chemistry or biology. And I do feel the lag behind of ChatGPT and the inability to be useful on many aspects.

On the positive side the image ingest is the most amazing feature so far, I love that feature of chat gpt more than all the other recent additions. But I do strongly believe that the AI Chat Agent is dumber and more complicated to deal with than before, I am waiting for the next big update of the cGPT-4 or such to hopefully see it being enabled to do more than before…

1 Like

When I was trying out 4-turbo in the API half of my prompts completely stopped working and I get “I’m unable to fulfill this request.” as the only answer. Asking why leads to the same answer. It seems like some prompts include stuff that gets interpreted as offensive or something - and my prompts are only for work and only for frontend development. So the restrictions seem to be so harsh even the slightest hint of anything that possibly could be offensive got hard-blocked. Which would be fine if it wasn’t triggered by stuff so miniscule it’s almost impossible to find out what was “wrong” in a long prompt.

Oh, and it gets worse: All 4 models, no matter which, still massively suffer from a) the lazyness problem introduced last autumn but also b) the apparently “new” approach by OpenAI to limit server load resulting in placeholders, omits, and straight dementia when after only 2 messages clear and simple orders are forgotten and ignored AND also c) the absolutely contradictory behaviour where 4 goes on explaining every miniscule detail about how it is going to approach the task while not actually DOING the task. Which is then followed by b) if you ask it to do what it just unneccessarily explained to you.

One could even get the impression OpenAI is trying to achieve that people use as much tokens as possible while also reducing the usefullness of the elongated answers so you have to ask over and over again to get your result - using up even more tokens of course. Oh how I wonder what the rea$oning could be behind using such methods …

OAI is doing something similar to what they do with ChatGPT-4 users, where they don’t benefit from us using more tokens. I think the issue isn’t about wanting us to pay more. I usually avoid jumping to “conspiracy theory” as my first explanation. So, if they’re doing something wrong, it’s likely because they believe it’s the right thing or due to unforeseen consequences. Therefore, deliberately complicating things to make people pay for more tokens seems unlikely since compute resources are limited; they would probably just raise the price per token instead, I guess.

I am not as impatient as I was at the beginning of February when I was loosing my mind because it was making me go over my 40 messages per ⅛ of a day. So do not think I can not relate to you because it is pretty similar in fact… I do remember I was pretty upset one day:

The Most Likely Hypothesis

I believe they’ve implemented measures to save money, like quantization or other tricks that are supposed to improve performance and reduce costs. My assumption is, they also want to ensure safety, so probably their tests looked good from their perspective. My guess is this focus on efficiency, which I’ll call quantization as someone who doesn’t fully grasp the concept (meaning trying to increase efficiency), combined with their priority on safety, might have led to what I’m calling Dissonant Synergy. I’m not completely sure how this played out, so these are just guesses, but I’m pretty convinced it’s something like that.

The Less Likely :smirk:

It might sound funny (even though I don’t think it logically makes sense), but others have mentioned before that this could be seasonal behavior :salt: (take this with a grain of salt). It’s interesting that this is a theory mentioned all over the internet.

I do think the competition from the other two models (one of which seems to have quite a few problems, to put it mildly) will make OAI think twice. I’d love to have access to a tool that’s less conservative about things that aren’t harmful.

Many issues I’ve complained about in the past have been resolved, so I guess I wasn’t the only one noticing problems. We need to be patient, but since OpenAI is somewhat opaque, they don’t seem to have a clear Iteration Plan.

I’ll say it again, they can keep as many things private as they wish (I’m not Elon Musk), but I always think there are many other areas where they could be more open, and maybe they’re just not great at communicating. I’m not sure.

I hope your situation improves soon. I’m not certain if our problems are exactly the same, but I believe they might have similar causes.

1 Like

Me: I need a language translation.
AI: You need to run some python code?

The image shows a screenshot of a conversation where a user asks an AI (presumably ChatGPT) to translate two English sentences into Hebrew, and the AI responds with a Python dictionary containing the translations. (Captioned by AI)

Just bottom level dumb.

1 Like

While I mostly agree with your reply and theory there is one big flaw in it: The absolutely useless wall of text GPT4 now answer to every simple question wasting many many many tokens. GPT4 now spends at least half, sometimes most!, tokens on unneccessarily 1. repeating everything I said again 2. telling me that it is now trying to think about a solution for my problem 3. then telling me how it will approach finding a solution for my problem. And only THEN it MAYBE goes to 4. and starts solving my problem. Most of the time it just stops after 3. having wasted a whole lot of tokens. That does NOT sound like optimization or " quantization at all.

For anyone interested in the topic - look at this thread, you are not alone:

1 Like

This is plain false. I have been using it for the same tasks over and over and it is now blatantly refusing to perform the same exact tasks as before. It takes over 10 turns to agree to do simple stuff that it can and should do. When I switch to GPT 3.5 it magically performs them and it even follos instructions better. I noticed WAY shorter responses over the past month as well.

1 Like

Hi!

This is a forum to help users solve problems. And we have a dedicated category for problems with prompts.
You are more than welcome to post your prompt and the issue there. Then you will see that we have many community members who enjoy the challenge of ‘making things work’.

It’s also a constructive approach compared to calling random strangers on the internet ‘liar’. Thanks in advance for keeping the discussion civil.

1 Like