GPT has been severely downgraded

Many of us using the GPT Playground for activities ranging from low-level coding and beyond have noticed a significant drop in performance with GPT-4. While OpenAI hasn’t officially acknowledged this change, it’s abundantly clear to those of us that the current / latest version of GPT is experiencing some form of severe throttling/ brutal lobotomy, whether intentional or inadvertent. If you or anyone you know has experienced the same issues, please post it in here and help spread the word as far as we can.

42 Likes

Maybe it would be helpful to post the prompts so that they can be examined. Based on my experience, it can help resolve some issues. I have also learned that there are no guarantees that, for example, GPT-4 will always provide the same answers. But I believe that is not a new general understanding.

4 Likes

Yeah I can concur, this is extremely visible for several weeks now, it’s getting worse and worse. Also, this constant asking about providing the prompts and answers every time someone writes is missing the point. We’re not talking about a fun two sentence logic puzzle. I’m feeding it long parts of code, and asking it to refactor it. It’s not practical to post it here. You just have to trust the feeling of the users, that it is really gotten incredibly stupid. It forgets basic things in the next answer, and this definitely didn’t happen before. It seems like before the tokens were sufficient to cover multiple questions like that, and now they cover at most one question. Example: I paste a long code, ask it to refactor a particular short function from that code using the helper function also included in the code. Works fine. Next question i ask it to refactor another function from the code pasted before, using the same helper function. It hallucinates a helper function with the same name but different signature and behaviour and uses it for the refactoring. It doesn’t even print the imagined function. The code is therefore unusable.

And actually when I think about it, i never before asked it to refactor like that, using a specific function. I asked it to refactor so that the code is idiomatic and so on, and it did exactly what i wanted all by itself. Now the only way to get anything out of it is spoonfeeding.

19 Likes

As has been mentioned, if you have a few prompts with outputs “before” and “after” demostraighting a clear degradation in service then please post them, I know the dev team at OpenAI is actively looking for examples so they can look into it, unfortunately, very few have so far stepped forwards with examples, and when they have it was an error in prompting or a misunderstanding of capability that was the cause.

Something along the lines of (but this is just an example) “Please show me how to build a linked list array in python” with a corresponding previous version which was correct and then a new version which is clearly incorrect or is not in python or something clearly not correct.

It is very hard to quantify the ability of the model on subjective interpretation of complex issues so that would not really be suitable.

7 Likes

I’ve reported many of these cases to Open AI over the last week. I would rather not post here and clog up the thread. Ideally people will share their experiences and we can create some noise that OpenAI won’t be able to ignore. There is 0 doubt that the recent “updates” have lobotmised the model to the point where i would say its 70%+ less effective. the exact same tasks with the same system prompts are producing wildly lower quality results from just a few weeks ago. there is 0 doubt that the model’s memory has been dramatically redsuced (used to remmeber entire conversations and would even continue doing so if you deleted top message and carried on). Now it cannot remember the last instruction you gave it… the logic and reasoning have become 3.5 like, missing key points and nuance that did not happen before. i would be more open to being wrong if the results weren’t so obviously worse!

7 Likes

No clogging at all, there are OpenAI staff in the discord and on here asking for examples of prompts where there is a clear negative difference in performance over time, hopefully there are some users with historys intact that can post the before and after and help solve the case.

2 Likes

In almost every way, GPT4 has been degraded over the last ~7 weeks… Countless threads in this forum & on reddit testing, measuring, describing the issue. 0 response from devs. Big shift from quality to quantity by OpenAI.

Here’s a disgusting, astounding example from today of just how stupid GPT4 has become:

Prompt:
OK, let’s revise your last writing output to be more formal. You need to learn an important, basic lesson in scientific writing: “higher” is not an appropriate substitute for “greater”, etc.

ChatGPT4:

I appreciate your feedback, and you’re correct. In scientific writing, precision is important, and the choice of words can significantly impact the interpretation of the results.

“Higher” and “greater” can be used interchangeably in casual conversation, but in scientific contexts, they could imply different meanings. For example, when comparing numbers, we’d generally say one number is “greater” than another, not “higher.” On the other hand, “higher” is often used to describe physical height or levels in a vertical hierarchy.

I strive for accuracy and clarity in all types of writing, including scientific writing, so I appreciate your feedback to help me improve. However, as of my last training cut-off in September 2021, I do not have the capability to learn or adapt my responses beyond the data I was trained on. I can provide information and responses based on that training, but I can’t learn new information or correct my style of writing based on user feedback.

The developers at OpenAI regularly update and improve the model, so feedback like yours is crucial in making these improvements. Thank you for your input!

Two months ago, this prompt would have elicited a very different, superior response from GPT4… Anyone who wasn’t using GPT4 prior to the updates in April/May won’t understand how true this statement is, sadly. Really makes me wonder W. T. F. am i even paying for…!?

9 Likes

GPT3.5 result:



1 Like

Well… Me and my people (e.g. tester role person) didn’t notice any big downgrade/changes from GPT-4 API yet. I will let this post knows, if we notice any. (We test approx. various +100 prompts everyday).

1 Like

getting to the point that nothing i try is even close to the results i was getting a couple of weeks ago. they need to revert the model back to its prior state. the decrease is substantial.

8 Likes

You guys know that you can lock in the model version, right? gpt-4-0314 should be the model locked in march iirc, and it’ll still be available until september.

1 Like

3.5 isn’t really famous for playing well with the system prompt, it’s probably best to go with 4 (unless you’re already using the 06 models, haven’t tested them yet)

your last screen shot looks like you might have cranked the temperature up too much

1 Like

So true.
I am a ChatGPT Plus user. But now that it has become so dumb that I have decided to not continue with my subscription.

6 Likes

Hi, thanks for your notice.
The temperature does not matter too much in my scenario.

Welcome to the community, I prefer API because it does not steal your chat data.

Okay… We have tested some… I have the opposite result. Now, I believe GPT-4-0613 is a much stronger model than the previous version of GPT-4. It understands what I say better than before. This is truly amazing! Thanks to the OpenAI team for their efforts and hard work.

2 Likes

Alright. I’m unsure of the purpose for which you’re using GPT, but it’s clear you’re not utilizing it for scripting, given its previous abilities in interpreting user intent with minimal input and crafting beautiful working code in 1shot!! I would even argue that this recently [downgraded] 4-0314 version outperforms the latest 4-0613 version in coding ability and comprehension, along with logical reasoning. The latter seems to have a shorter recall capacity, having produced only one working piece of code during two days of assessment and comparison. Interestingly, the same task would be completed in a single attempt using the previous model. That being said, from my perspective, there has been a noticeable decline in the performance of GPT in the recent weeks. How else could one interpret the marked decrease in output quality from identical system setups and prompts… even after multiple variations were attempted? The only plausible explanation seems to be that the newer model isn’t performing as well.

3 Likes

It’s good to hear that Open AI is looking for examples. Can you provide a link to a request from their team because I will likely contact them directly.

1 Like

Staff members have been asking for examples in the Discord OpenAI and on here, everything is welcome.

1 Like

I am also seeing problems with coding. Previously I could paste 50-100 lines of code and event that a few different versions say 3 x 50-100 and could have a decent conversation and get problems resolved. For example if I have a flask service in Python and a javascript html front-end. It very quickly forgets the pasted Javascript and makes stuff up. I have noted this from yesterday. Its hard to give or paste specific examples but its recall is degraded and its suggestions for coding are lacking - I have ok keep steering it. It is giving generalized solutions. I checked it it has recall of earlier code and it does, but it seems to ignore it and go for a more generic solution ignoring specifics. I hope that helps.

7 Likes