GPT-4 has been severely downgraded (topic curation)

honestly chatgpt 4 is borderline unusable at this point… it is so unbelievably dumb… what a shame! what happened!!! It literally doesn’t understand a single word 99% of the time now, ive prompted and reprompted and it is just not getting like it once was…

3 Likes

I think it is being throttled, and sometimes you get enough GPU to match what it used to do months ago. It is odd how bad it is being in the last few days though, stuck in logic loops without context enough to view the code I upload or paste for more than 1 prompt after. Keeps giving a solution it proved out as wrong, suddenly forgets what we were working on. Lots of “need to upload the file again” right in the prompt I uploaded it in :/. This is terrible.

Yet in the last few days it did at times suddenly output an entire correct code refactor with fixes that were as before, excellent. It feels as if you can do it, yet it takes a lot of patience and focus. If I keep hammering at it about a problem, finally there is a window of time the jackpot is output. Code is good and boom problem solved like previously. Those moments are getting fewer and fewer lately. Keeping context up to them is difficult, and wasteful of time. I am sure this overall wastes more GPU for OpenAI by making us have to push it and hence a lot of extra cycles instead of just doing it and being done.

I am not sure what it is, I’m just trying to figure it out. What would reduce the context and allotted “power” of chatGPT?

It’s literally the context gets small and the amount of “cycles” it will give me per question reduces and increases depending on peak time of day and other unknown factors. I see this behavior, I am a heavy user so am “feeling it” do this. I am not imagining it, somehow it is being “reduced” at times in “power”.

1 Like

I’ve experienced exactly what you’re talking about. It’s so odd… there comes moments where the true AI Genius that was GPT 4 from a few months ago shines through…and other times its a joke… and i’ve experienced it apologising A LOT, A LOT!!!

2 Likes

It could be that they are using more aggressive token-reduction techniques (see: context loss) and also using lesser models to answer certain questions. It could also be that they are dynamically controlling these factors based on the current usage.

It’s impossible to say for certain as they don’t tell us anything.

I have given up trying to have conversations with ChatGPT. It’s completely lost. Most of my custom instructions were never necessary. Avoid generic solutions, respect nuances, avoid omissions, etc. A band-aid solution in my opinion.

But, it does have greater reasoning capabilities so I don’t really mind it. Plus it seems like soon we will be able to run our own models and then use GPT-4 to create training data. So this heightened reasoning capability is great in my books.

Now I just have quick, usually single conversation pairs. I’d rather curate & carry the context myself into a new conversation instead of relying on OpenAI’s black boxxed solution.

1 Like

@Foxalabs are there any mods who can remove this comment from the thread? unhelpful and snarky.

1 Like

This whole topic is generally unhelpful. Especially since mods consolidated on a thread topic that started even before GPT-4 existed, and closed others with more articulated evidence…

A post can be flagged by a user and then removed by a mod if they agree, making a strike against the user. You can just ask nicely if someone would edit their abrasive post.

The simple truth that ChatGPT isn’t going to get better by 1000 more comments here is not a policy violation.

Mods have been reigned in to not respond to every flag or post that simply doesn’t have the professional candor of an (unpaid) OpenAI employee.


PS. Please note if the experience you share is in using the API or within ChatGPT, especially for forgetfulness. ChatGPT has the additional complication of a conversation history management engine that aggressively minimizes the memory of past chat.

So true. There’s now so many different versions of GPT-4 (API, cGPT, Mobile cGPT, plugins, CI) that this “curation” has just muddied it all together.

The only benefit to this thread now seems to be purely therapeutic.

1 Like

Pretty sure this tweet by a VP product guy is the internal stance of OpenAI on the superiority of their product. Gaslighting their entire user base may have enflamed the 650,000 people who read it on Twitter and now are on this thread venting.

2 Likes

They will go to their grave with their measuring stick in hand (not a threat). I’m sure there’s some merit in his words but the fact that they release no information regarding their newer models and ChatGPT functionality makes the comment so completely out-of-touch and tone-deaf.

Was it ever even confirmed that GPT-4 is actually using a Mixture of Experts? There was a really good article exploring this but I can’t find it anymore.

Here it is:

Makes one wonder why there is still no logprobs :thinking:

EDIT: oooh there was another article released that I didn’t read

1 Like



Here is an example of it suddenly forgetting what it was doing, and admitting it.

It was working okay up just now, getting into 3pm West coast time, boom it’s no longer performing with the memory it had been.

Then just now it seems to have completely forgotten the task it was working on and “made up” a new one completely off topic. This wasn’t anything to do with anything I asked, changing Rate of Audio playback. I asked for it to find the issue in the code, it was making changes overall.



This error looks familiar, happens during these amnesia moments.

So it will have some sort of “breaks of continuity and context” over and over again during some periods of the day vs. less so other times. Usually following peak times of West Coast USA.

I have found the upload file feature is becoming useless, like a tiny hole of vision over the file vs. pasting into the prompt. Yet it works better in code mode than in plugin mode where things really seem wonky at times now, since I can’t paste in a file to help it. Yet both paste + upload together sometimes are better, other times make it completely break too. Simply keeping context would save it so much in resources, since I just keep hammering it till it finally gives me the money :P.

1 Like

It would be more helpful to everyone if you could just share a link to the chat.

One thing I’ve noticed with Code Interpreter is that occasionally the environment can be reset mid-chat—it’s rare but I can confirm I have observed it.

Often when that happens the model seems to go off the rails a bit as it knows it defined an object but it doesn’t exist. So suddenly it has an internal representation of the environment that is completely detached from reality. So, it’ll try to run some code which it thinks should work, get an error about an undefined object, define that object then get another error about a different undefined object, rinse and repeat.

This can chew up a lot of context if the error messages are quite long, which can lead to instructions and the model’s internal representation getting pushed out of context.

From there things can rapidly cascade catastrophicallyb out of control.

But, again, this is (to the best of my knowledge) very rare.

1 Like

Ah I see this frequently in bursts, where it happens almost every other prompt. Comes and goes, yet anytime I really go for a long time it starts to happen. I do suspect it seems changing to a new context / session may clear it up for awhile. Also it may be when I really jam it full of data.

1 Like

If you could isolate any commonalities for when it happens you could submit it as a bug and maybe get a bounty,

It would also be a big help to everyone if OpenAI were able to ameliorate the problem.

2 Likes

I am sharing two chats (in Dutch) of a plugin I am developing. If you look at only the chat from August (today), the responses might seem ok. But on close inspection of the GPT response where an answer (by me) on an exam question is judged, July gave a brilliant and useful explanation what I did right, the mistake I made, what was going on and how to correct. August also provided words, but either the words are just repetitions of what’s been said before, and when it judges my answer, made errors in judging what I did right and didn’t explain why the final answer didn’t match.

Note: the prompts the plugin uses have changed slightly in the meantime - I do believe that the perceived differences described below are more due to differences in the underlying gpt-4 model, than changes in the prompts. Results vary from time to time and seem unrelated to the prompt changes.

July: (the first one, I think of somewhere ~ july 11)
https://chat.openai.com/share/98888035-982c-4476-8d27-eda85d9e0a67

I saved this one with !! in the name because I was so impressed with the results - couldn’t think of anyway how to improve it. It was just perfect and I developed fuzzy warm feelings for this gpt :slight_smile:

August: (second one, today)
https://chat.openai.com/share/f85ab6c1-e582-42ee-b051-5f6b8fa9313b

The most notable difference in quality of the GPT response is the response to my answer/calculation “Ik denk dat het zo moet:
C5 = 500000 * (1 - (1+0.07)^-5)/0.07
= 2050098 euro
Daar moet dan nog de initiele investering vanaf - 1700.000 = 350098 EUR
En dan nog de restwaarde erbij +300.000
Eindantwoord NCW = 650098 EUR”

My answer/calculation contains a single ‘bug’ and that is that I haven’t calculated the future value 300.000 eur back to its value today (devide by 1.07^5)
Both versions fetch the ‘correctievoorschrift’ (gold answer and points) correctly.

July has an amazing response:

  • it correctly shows how the value of €2.263.994,57 is calculated (this exact calculation appears in the correctievoorschrift) and especially the last term in the left hand side is useful to show to the student, because it is different due to the 300.000 added.
  • It very precisely tells me what I did right, but also wrong, and why, and then proceeded to explain thoroughly what should have been done instead.

Aug response:

  • It reiterates my answer - but no additional info or insights - so no perceived added value.
  • It also has the tendency to create subsection headers - I am not fond of this personally.
  • Then continues with the gold answer, but with the end result only. The way the answer is calculated is NOT shown to the user.
  • It then compares on the end result only, and states that my answer does not correspond to the gold answer.
  • It then says that I added the 300.000 correctly (but I didn’t do it correctly, because I forgot to devide this value by 1.07^5) and then continues that the my final answer is wrong because it doesn’t match with the final answer from the ‘correctievoorschrift’.

In the remaineder, another thing July did better:
My question: “Als de cashflows gelijkmatig verspreid ontvangen worden, kan ik toch wel de totale investing delen door de jaarlijkse cashflow? ik krijg dan het aantal jaren in een aantal decimalen. Dat moet ik dan omrekenen naar maanden ofzo.”
July: provided a correct answer to this question
Aug: misinterpreted by question as an answer to question 17, and proceeded to fetch the gold answer, and give that answer in the response. This way, the student would’ve been prevented to come up with the answer him/herself.

The July version :rocket: and the Aug version :cry:

2 Likes

The post was mainly intended to respond in detail to the call for info in this tweet https://twitter.com/officiallogank/status/1695248712669483034?s=46

I can retry the chat a number of times (maybe also set the prompt to the original one), how many times would be satisfiable?

1 Like

I don’t know if you have experience developing a plugin, but what I would be interested in, is a way to test a plugin-chat 30 times without violating the chatgpt policies (As far as I know, using selenium or other browser based chatgpt wrapper is not allowed and could lead to a ban). Also, there is no api access to a plugin enabled gpt4 (and the api gpt4’s could be different)
So practically, 30 chats tested systematically is impossible.

But also, I do not think 10-30 chats is required for the specific call for info from OA (the tweet I replied this post to) As the plugin/prompt developer, I spend a lot of time repeating the same messages, and then you get a feeling what instructions the model can follow but also the limits. I don’t think the specific examples I shared are unrepesentive of the general feel of the responses at the time. The audience for this post is people at OA - maybe its useful, maybe not, but only they know what was changed when.

2 Likes

Do we have any updates on this?

It feels that v4 has 0 memory now.

What is going on?

2 Likes

Could this have anything to do with the suspected downgrade and lack of support for the regular folks?

If it is, it’s a bit disappointing. But, on the other hand, if it keeps OpenAI out of bankruptcy and still the darling of M$, then it is actually a good thing for us all.

1 Like

I use a brief prompt at the beginning of converaations that I expect to be on the longer side.

“Please make your responses brief and refrain from describing your limitations.”

It’s been very good at trimming non-essential, “conversational” fluff and doesn’t waste context on the paragraph you gave above.

1 Like