Gpt-4 has severely degraded in the quality of understanding tasks, memory for variables has deteriorated, attention span has deteriorated. Completely ignores prohibitions, for example, ”not to use square brackets“. Now I have to rewrite 90% of texts because of hallucinations, before it was rare. Even open source Orca LLM gives better answers.
The issue is no longer just about whether OpenAI has indeed lowered the quality of their models. Regardless, causing us, as consumers, to worry is an infringement on social fairness. OpenAI should take responsibility for this. It’s essential that we make more people aware of what OpenAI has done to us during this period. Together, let’s stand firm and resolutely defend our rights as users and consumers.
I’m totally prepared to invest more for the sake of accessing advanced models. However, this doesn’t imply that I’m comfortable with paying the same for models whose quality is experiencing a gradual decline. At a minimum, I deserve to be aware of the actual situation to make better decisions. Decreasing the quality of service without any clear announcement indicates a lack of integrity. Regardless of whether this degradation has actually taken place, the very act of sowing seeds of suspicion is seriously disgraceful. OpenAI’s current conduct merits intense condemnation for the sake of social justice.
OpenAI should promptly respond constructively and initiate immediate changes.
There is now scientific research reinforcing the strong degradation: How is ChatGPT’s behavior changing over time? DOI 10.48550/arXiv.2307.09009
Well said. I really would like to see more community engagement.
I canceled my subscription due to this. I get the feeling that they’re going to go the route of cable companies and specialize LLMs. If you want coding, pay for this, if you want general chat ai, pay for that, if you want maths, pay for this…it feels like the freaking 80s & 90s all over again…rinse and repeat. Perhaps I am missing something…
There are many fundamental concerns with this paper.
Here is one demonstrating that GPT-4 actually improved from March to June,
I’ve already written my concerns about the methodology for their test of mathematics ability, evaluating if a number is prime or not, which I will attempt to summarize here,
- GPT-3.5 and GPT-4 are large language models, and while math is a language and they have shown emergent capabilities in the field of mathematics, evaluating if a number is prime or not is not a good test of mathematics ability, I would be hesitant to say it is a test of mathematical reasoning at all.
- They tested using only prime numbers. The problem with that is we cannot discern if the models have lost (or gained) reasoning ability or if the models have any bias for answering “yes” or “no” to the question “Is [number] a prime number?” If they had included composite numbers in their tests, we would have a much clearer picture of what is happening here because we could compare the proportion of composite numbers the models identify as prime with the number of prime numbers the models identify as prime.
- They used a
0.1. There is nothing inherently wrong with choosing this temperature, but it,
a. Does not represent the expected behaviour of ChatGPT where the temperature is 1.0.
b. Suggests they should have done more than one replication for each number to account for the variance of the model. Then they could have set a threshold, say 75%, at which the model would be considered to have correctly answered the question. E.g. run each number 20 times. If the model gets the correct answer 15 times or more, it gets credit for being correct on that question.
Now, I haven’t yet had time to dig through the rest of the paper, but if these issues are immediately apparent in a 2-minute read-through of the paper, I suspect there are other issues as well.
It should also be noted that this is a draft paper, it has not been peer-reviewed. I do not imagine this paper would be published in any worthwhile journal in its current state, and I am doubtful the issues could be corrected—especially since the March model is no longer publicly available.
I canceled my account. I’ll gladly return once there is some transparency.
To anyone from Openai looking for examples of degredation, just give us access to a version of Gpt4 from earlier than May and one from May onwards.
Personally I deleted all my chats due to a naive assumption that this action might help restore the earlier (superior) functionality of the AI.
So I don’t have my original conversations but I can recreate them quite easily, as I still have copies of the unedited subject matter. It would be exceptionally easy to demonstrate the degradation. It would be a couple of days work due to the rate limits, but I believe I could create a substantial amount of evidence.
Also, I too would be willing to pay more for access to the pre-May version. It was vastly superior (this is a fact, not an opinion) for coding.
Just cancelled my subscription as well. This downgrade is not even subtle. The difference in capacity of the GPT4 model from just a few weeks ago is grotesque, huge, unquestionable, obvious, simply impossible to deny for anyone that uses it as a coding helper/time-saver. I’m wasting more time checking and fixing it’s mistakes than saving time at this point. And just to point out: This was EXACTLY the same downgrade I experienced with 3.5 model, right before they launched the GPT4 version. That is making a pattern clear here in my opinion. Just shameful.
See my post over here. It was announced today to extend the 0301 and 0314 “smart” models and make sure the new models are “smart” before deprecating the original ones. This should be good news for you!
Yes, that’s good news on the API side of things. Sorry I didn’t make myself clear, but I was talking about the web interface, which is the one I use manually as a “coding accelerator”, so to speak. That’s the one I pay the PLUS subscription for. I don’t really use GPT-4 model in the API due to it’s cost right now. I use the 3.5-turbo model for my applications. But that’s good news nonetheless, thank you for pointing that out.
Understood. You can use the API version in the Playground, which is a web interface (no coding required) … which I mentioned in the post above in the linked post.
But what this means, I think, is that the next model has a good shot at being “smart” for ChatGPT. So wait and see, I guess, if you only want to run ChatGPT …
But Playground/API has them now with no wait.
Understanding, of course, the different pricing models between API and ChatGPT.
Link here for clarification on the other post I am referring to:
You know, I hadn’t realized before that I could use the older version (from March) in the playground. I’ll give it a try. It’s unfortunate that it incurs costs at the GPT-4 model level with every request I make there. I’m on a really tight budget, but at the very least, I have the $20 I’m saving from the subscription I cancelled. I believe that will allow me to make quite a few requests in the playground without exceeding my budget. It might even be more requests than I usually make in a month, I don’t know, I’ll have to check. Anyway, thank you so much for the heads up. It’s helped me significantly. Cheers!
The original multiple topics received a lot of attention and a lot of responses.
Since you aggregated them, I think you should post summaries, response counts, etc. on the previous topic.
(Isn’t that the role of a user with moderator privileges?)
Also, please add the tags that were given to the original topic.
I look forward to your meticulous work.
*I made a few mistakes because I’m not used to posting. sorry.
A few days ago Logan had to change the users who were not OpenAI employees from a full moderator to just category moderators because as full moderators we had access that needed to be restricted for future OpenAI plans. As such some of what you seek I can no longer do.
They can still receive the attention (being viewed) and this topic allows for responses.
The larger ones had summaries posted in them a few days before they were closed.
Added to first post as images.
I don’t read minds, I have no idea what you seek.
Done (double entendre)
In May, I was using GPT-4 to write a novel. I had a very long chat, and GPT-4 remembered every single detail from the beginning to the end. At some point, it started responding randomly and out of context. I thought it was a temporary issue or that I had broken my chat, but as I continued, I realized there had been a downgrade. Now, I have finally found other people who noticed the same. I believed in the project and have been a plus user since day 0, but now I will cancel my subscription until there is an official response
GPT-4 in it’s current incarnation is an 8k context model. 8k of tokens is approximately 6000 English words, less for code or symbolic languages. If your current conversation had references within the last 6000 words to the prior content, then it will be able to infer context. As soon as all reference is lost to topics more than 6000 words ago, the model can hallucinate facts if it is required to comment on them.
This limitation has been present from the initial release and has not changed.
This is not the case. I started a new chat for new chapters. Now, it’s really difficult for me to work as I did before. Even after a few messages, he forgets the context and starts to invent.
It looks dumber than before. It’s a real shame that it’s going this way.
Since yesterday’s update and setting the custom instructions, I’m experiencing better responses again (ChatGPT web). I have not experienced any issues today or had to re-explain things over and over. If it wasn’t for someone here in the forum pointing out the new custom instructions settings under the beta features tab, I wouldn’t have even known this was added. Would have been nice if this had popped up somewhere after it was added but maybe I just missed it somehow.
Either way, in my opinion it’s looking better than the last couple weeks, so let’s hope it stays that way.