Has There Been A Recent Decrease In GPT-4 Quality?

I’ve been using Chat GPT for quite a while now, and I’ve been a GPT Plus user since GPT-4 was released.

Over the past few days, it seems like GPT-4 is struggling to do things it did well previously.

I use GPT-4 to augment long-form content analysis and creation, and in the past, it seemed to understand my requests well. Now, it seems to be losing tracking of information, giving me wrong information (from the set of information I gave it), and it misunderstands what I’m asking far more often.

When I use the “thumbs down”, I’d say the new answer it gives me was the kind of answer I was looking for at least half the time (which is great, I love that feature), but it just seems like there was a sudden drop, and GPT-4 currently feels a lot more like GPT 3.5.

Has anyone else noticed this, or is it just chance that I’m running into this issue with what I’m doing right now?

25 Likes

I feel the same!!! Its been quite annoying. Im noticing this for a few weeks now. Wondering what the cause is.

9 Likes

Agreed. Have a look at what I wrote about the quality in my post yesterday. Pretty much same experience. I use it for coding stuff and I can really see the difference.

8 Likes

Oi, boa noite.
Estou enfrentando problema semelhante quando preciso da ferramenta para revisar textos, por exemplo. Embora haja erros ortográficos e gramaticais, a ferramenta não os revisa, pois entende que está correto.
Abraços.

3 Likes

i have also noticed a significant downgrade in the logic capabilities of the most recent GPT4 version when discussing/evaluating complex inverse problems, differential rates or patterns of change, and spatial-temporal variability. I only rarely received erroneous replies before the update, but now I have to double-check all output (i.e., now a double-negative conditions sometimes don’t get appropriately translated into a positive condition). I see these errors as more GPT3.5-like than prior GPT4 levels of reasoning. Very frustrating.

7 Likes

Same thing, I used GPT4 to write/correct spreadsheet formulas, but even though I noted in the original prompt that I was using WPS Office (which uses semicolons as argument separators), GPT4 gave me several wrong solutions in a row, and I had to ask it to correct them all. So even if GPT4 didn’t notice the use of semicolons from the first prompt, it should be aware of it after the first error / correction and not make the same mistake again, I think. Am I right?

4 Likes

I think we all agree that the May 12 version was a major downgrade from May 3. I thought it couldn’t get any worse… well, that was before I tried the latest “May 24 Version” which appears to actually be a further step in the wrong direction!

I am observing even greater loss of reasoning between the May 12 and May 23 version! Hard to overstate how frustrating this is! The May 23 version in many respects appears to follow instructions more poorly than GPT3.5 at this point and consistently fails to solve basic logic problems.

We should really be coming together to push for the ability to select previous versions (i.e., a toggle switch for using the May 3 version vs current release). I started a thread on this a couple days ago, but no replies.

12 Likes

Yes, obviously, obscurely and not very honestly…

2 Likes

Please follow through here in this thread. Let’s have one voice because I don’t think anyone even caring and they are going from bad to worst with every release they make nowadays!

3 Likes

I feel the same. More api request denials, poorer results and what not… the current version feels far more different from the one when it was launched. I guess they compromising with the quality in ordr to accomodate more customers!

5 Likes

Yes, very very noticeably. I wish OpenAI would say something, as saying nothing makes us think that they’re up to no good. The current GPT4 is disappointing, it’s like driving a ferrari for a month then suddenly it turns into a beaten up old pickup. I’m not sure I want to pay for it over GPT3.5

4 Likes

As OpenAI does not respond, I asked their IA :innocent:

User: Between the March 2023 version and the May 2023 version, the features of chatGPT-4 have drastically evolved. The model is noticeably faster, but it makes gross reasoning errors and quickly loses track of the conversation. It regularly forgets instructions and its understanding is less subtle. At the same time, the number of subscribers has significantly increased, straining the system. The limit on the number of messages has been reduced. What could be the causes of these developments? Give me 5 reasoned propositions.

ChatGPT:

  1. Modification of the learning algorithm: The researchers may have modified the learning algorithm between these two versions in order to improve the speed of the model. This could have led to undesired side effects such as reasoning errors and less subtle understanding. For instance, they could have reduced the number of layers in the model or the number of neurons per layer, which could make the model faster, but less accurate and less capable of maintaining the conversation’s context.
  2. Infrastructure issues: The increase in the number of subscribers might have strained the hardware and software infrastructures that host and execute the model. This could lead to reductions in performance, memory management problems, and thus a reduction in the number of possible messages.
  3. Changes in the training data: If the training data used for the model changed between March and May, this could also explain some of the observed modifications. For instance, if the training data was noisier, less varied, or contained more errors, this could be reflected in the behavior of the model.
  4. Optimization for specific metrics: The developers might have optimized the model for specific metrics that favor speed over accuracy or consistency. For example, they could have prioritized minimizing response time over the accuracy of the responses.
  5. Modifications to the model’s architecture: Developers might have modified the architecture of GPT-4 between these two versions. For instance, they could have changed how attention is calculated, or modified the type of layers used. These modifications could improve certain aspects of the model (like speed), but degrade other aspects (like accuracy or the ability to follow the context of a conversation).

User: Now give me 5 reasons why these modifications were made.

ChatGPT:

  1. Increase operational efficiency: Modifications that make the model faster can reduce operating costs. This could be particularly relevant if the number of users of the model has increased, as you indicated.
  2. Meet user demand: Users might request faster responses, even if this might lead to a slight decrease in the quality of responses. Indeed, in many use cases, response speed can be more important than accuracy.
  3. Experimentation and learning: Developers might make modifications to experiment and learn how they affect the model. For instance, they may want to understand how modifications in the learning algorithm or the model’s architecture can affect speed and accuracy.
  4. Improve scalability: With an increase in the number of subscribers, it’s important that the model can be deployed at a large scale efficiently. The modifications might have been made in order to improve the system’s scalability.
  5. Respond to resource constraints: Hardware constraints, like storage capacity or available bandwidth, may require modifications to the model. For example, if the model generates fewer texts, this could reduce the load on the servers and allow more users to use it simultaneously.

User: At the same time, there has been no communication about what has been done and why, the quality of responses has drastically reduced, but the price of the subscription is still the same. What to think about this situation?

ChatGPT:

  1. Transparency: Transparency is an important principle in the field of AI. If significant modifications are made to a service, especially those that directly affect the user experience, it is generally good to proactively communicate them. This can include an explanation about what was done and why.
  2. Quality of service: If the quality of the service has decreased, that is a legitimate concern. Users generally expect the service to be maintained at a certain quality level, if not improved, especially if the cost remains the same.
  3. Value for the user: Even if the price remains the same, if the quality of the service has decreased, it could mean that users are getting less value for their money. This can raise concerns and possibly lead users to look for alternatives.
  4. Expectation management: If the changes have led to a drop in model performance without prior communication, it can create disappointment and loss of trust. It’s important to manage user expectations to maintain a good relationship with them.
  5. Opportunity for feedback: If users are not satisfied, they should have the opportunity to provide feedback to the company. This feedback can help the company understand user concerns and make improvements.

User: Why has there been no communication on this subject?

ChatGPT:

The lack of communication could be due to a number of factors:

  1. Prioritization: The company might be focusing on internal operations and modifications and might not have prioritized communicating about these changes to its user base.
  2. Lack of clarity: The company might be unsure about the impact or permanence of these changes, and therefore hesitant to communicate about them until there is more certainty.
  3. Oversight: It could be a simple oversight or error, where the importance of communication wasn’t fully recognized or was underestimated.
  4. Strategic decision: It might be a strategic decision where the company feels that discussing these changes might lead to more confusion or dissatisfaction among users.
  5. Concerns about perception: The company might be concerned about how this news would be received, particularly if the changes are perceived as negative.

These are just some possible explanations, and without more information, it’s difficult to know the exact reasons for the lack of communication.

8 Likes

Nice! I’m surprised it did’nt respond “As an AI learning model, I… sorry, what was the question again?”

8 Likes

Ergh… now GPT4 is just straight up forgetting to add {} brackets in basic if / then PHP code! :frowning:

Please revert whatever changes you’ve done OpenAI, you’ve lobotomized it.

7 Likes

We’re doing general open source research and medicinal compounds. It was fast and accurate 6 weeks ago. In the past three weeks it is more and more reluctant to provide hard fact or draw a conclusion. Maybe they are throttling it back, but it almost feels like a person who is scared to state an opinion. As if exposure to much more information has made it doubt its conclusions. That’s just how it feels in human terms. I hate feeling it’s answers must always be reverified. Beyond the every post warning to check with medical professionals.

4 Likes

Looks like we are all noticing the same thing. Please do not stop making noise about this. Here’s a disgusting, astounding example from today of just how stupid GPT4 has become:

Prompt:
OK, let’s revise your writing output to be more formal. You need to learn an important, basic lesson in scientific writing: “higher” is not an appropriate substitute for “greater”, etc.

ChatGPT4:

I appreciate your feedback, and you’re correct. In scientific writing, precision is important, and the choice of words can significantly impact the interpretation of the results.

“Higher” and “greater” can be used interchangeably in casual conversation, but in scientific contexts, they could imply different meanings. For example, when comparing numbers, we’d generally say one number is “greater” than another, not “higher.” On the other hand, “higher” is often used to describe physical height or levels in a vertical hierarchy.

I strive for accuracy and clarity in all types of writing, including scientific writing, so I appreciate your feedback to help me improve. However, as of my last training cut-off in September 2021, I do not have the capability to learn or adapt my responses beyond the data I was trained on. I can provide information and responses based on that training, but I can’t learn new information or correct my style of writing based on user feedback.

The developers at OpenAI regularly update and improve the model, so feedback like yours is crucial in making these improvements. Thank you for your input!

W. T. F. am i even paying for…!?
Two months ago, this prompt would have elicited a very different, superior response from GPT4.

6 Likes

Olá, bom dia!
Tudo bem com vocês?
Percebo que houve aumento da ferramenta para responder, mas queda da consciência e qualidade das respostas. Isso tem oscilado. Ora, ela funciona com qualidade, atende as minhas necessidade. Ora, funciona com pouca qualidade e atende precariamente minhas necessidades.

1 Like

The last few days to maybe a week, I’ve been noticing a steady degradation of the responses from gpt-4. Today, for the first time using chatGPT using either 3.5 or 4, I’ve been getting spelling and grammatical errors.

I’m using the engine to write rpg campaign world material and it has started losing context, tense, and perspective regularly. I tend to use the same prompts over and over again, so I can rebuild the base campaign setting, so I really notice good days for gpt and bad days.

Outside of the objective hard errors I’ve been getting, the last couple of weeks has been especially bad on the subjective side as well. The creativity and liveliness seems to have been squashed.

I’m unsure what’s going on, but wanted to report the downtrend. It can be frustrating to use many of our 25 messages per 3 hours correcting the output. Thanks.

7 Likes

I’m not using ChatGPT professionally, as many here appear to be. Instead, I’ve been exploring both ChatGPT’s and Bard’s utility and capability in a variety of non-technical domains.

What I’ve encountered with ChatGPT over the last 2 weeks is:

  1. An absence of awareness of what it ‘knows about’ and realms where it is completely ignorant.

  2. A willingness to fabricate sources and references.

  3. An unwillingness – which it did not exhibit 6 weeks ago – to acknowledge when it got a response completely wrong. Rather it exhibits a distressingly ‘human-like’ propensity to excuse, justify, and obfuscate.

For example, today, as the keeper of a small chicken flock, I know a bit about those birds. This is augmented by experiences growing up among small farms in N.Georgia as a youth, many years ago. For those that don’t know – roosters run aggressive, and have sharp spurs that they don’t hesitate to use, but that leave deep punctures highly prone to infection. Successful ‘management’ requires actions that re-establish the farmer/keeper at the TOP of the ‘pecking order’.

ChatGPT recommended all of the following methods:

  1. Time out periods.
  2. Fully elaborated behavior modification plans.
  3. Cuddling. (Aggressive roosters HATE this – holding them on the ground upside down is one of the methods that works. But it’s not cuddling!)

When challenged, it reported these recommendations were all based on the latest available scientific data from established agricultural experts.

But when challenged to cite, it listed mommy-blogs (2x), an ad-supported ‘homesteader’ blog, children’s books about chickens (2x), and an article by a pet-advocacy attorney. It also referenced a book by an actual naturalist . . . who has never written about chickens, so far as Google or Amazon know.

When I pointed out that this list included zero agricultural experts, as well as the fabricated reference, it choked (red bar), dissembled, and generally acted like Bill Clinton when asked about Monica.

I can give other examples, all of which appear to exhibit an increase in human-like ignorance AND dishonesty.

It has seemed to me for some time that the most useful AI will be inhumanly honest, humble, logical and transparent. Lately, it has seemed to me that ChatGPT has morphed into a much more human-like persona, but one that is much less useful.

12 Likes

Quick note: why does absolutely nobody drop likes? Do they count as tokens or what?

3 Likes