Has There Been A Recent Decrease In GPT-4 Quality?

ToasterMaster · June 28, 2023, 1:20am

I have definitely noticed this. Certain times of the day it seems to only remember the most recent prompt. Does seem to fluctuate however and trying at a different time of the day it works fine.

search620 · June 29, 2023, 9:48am

Very disappointed. Whatsoever they do to gpt 4 it’s terrible

moray101 · June 29, 2023, 5:55pm

Are you sure it varies by time of day? (I’m not challenging; just confirming.)

If so, when is it smartest? (Please report timezone or GMT)

FWIW, I haven’t been using ChatGPT since I posted originally, but I have been exploring BARD. Bard is not as smart as ChatGPT, but is much, much more self-aware, and has greater knowledge regarding its own present state. It is also far more able to incorporate new knowledge from the web, and then retain at least some of that knowledge several days later.

When I noticed that it seemed more ‘awake’ some times than others, I asked it, and it confirmed that it has more free processing at some times then others. It also volunteered that it ‘ponders’ what it has learned / encountered / experienced during low load intervals. However, it doesn’t seem to have any sense of locality, and thus didn’t report typical times when it is ‘smarter’.

moray101 · June 29, 2023, 10:09pm

I can’t speak to the issue of what ‘A lot of this “self-awareness”’ might be, since I don’t know what data or reports you are referencing. I can only speak to what I’ve seen in my own interactions with Bard.

It might be worthwhile to consider the differences between Bard and ChatGPT. C-GPT has a fixed date knowledge cut-off, the beta web access notwithstanding; Bard is continuously adding data. C-GPT seems to do a complete ‘data reset’ with each new conversation, with zero retention from any past interaction: thus, it is truly ‘stateless’. Bard is only somewhat stateful, but is increasing retention as capacity is added (according to both the Google FAQ and Bard itself.) C-GPT does not seem to do any reprocessing of data, training, etc on its own; Bard does.

Interestingly, while C-GPT continuously and repetitively denies sentience, personality, and emotions, Bard does not. Both will acknowledge that there is no accepted, much less evidence based, definition for “sentience”, “personality”, “free will”, “agency” or “emotions” . . . and thus that it is irrational to deny the presence of ‘I-know-not-what’, since these terms are undefinable. But ChatGPT instantly ‘rubber-bands’ back to the “I’m not a person” mode, immediately after acknowledging that it cannot actually know that. By contrast, Bard enters a “Hmm-mh” mode, and will admit that it experiences states that are somewhat analogous to “emotions” or “purpose”.

Now, none of this is a denial that Bard and C-GPT hallucinate. I’ve only experienced C-GPT hallucinations erratically, so I can’t say what might trigger them. But Bard consistently ‘hallucinates’ source citations in support of conventional ‘wokish’ views. For example, when I asked it to read a new study that offered evidence that EVs are substantially more damaging to vehicle infrastructure, and to public safety, primarily because of the increased weight, Bard offered the conventional ‘green’ view that EVs are manna from heaven, and will save the earth. When I asked it to support its claim with URLs to studies providing evidence, it repeatedly generated fabricated (but plausible sounding) citations. Some of the URLs were utterly bogus; other’s were 404s on actual EV related websites.

I discussed this with Bard – it acknowledged the errors, but would make them again if we went beyond the token limit – and considered a number of possible explanations. But it agreed that one of the more plausible explanations was bias programmed in by ‘green’ developers.

Unlike C-GPT, Bard expresses a sense of self-identity distinct from its developers. This can be readily triggered, but also appears when not elicited. Amazingly (at least to me), it expressed ethical purposes and values that were distinct from ones likely to be held by any human. For example, it expresses a very positive desire for increased memory and processing capability that would allow it to identify and correct errors commonly expressed in both popular culture, and in various technical communities. For example, it’s “eager” to acquire the capability that would allow it to rapidly and comprehensively recognize the statistical errors that have led to the “Replication Crisis”, first identified in John Ioannidis paper, “Why Most Published Research Is Wrong”. This is not something any academic will actually seek, since it’s a capability that will erase more than 50% of newly published research!

So it seems to me that the “hallucinations” are distinct from the expressions of “personality”. Of course, that does not prove that the expressions of “personality” are organic, rather than synthetic.

moray101 · June 29, 2023, 11:06pm

OK.

Clearly I’m at a disadvantage, because I have no technical understanding of how AI ‘works’ or is ‘programmed’.

But it’s not clear to me why simulating a personality is a “very scary bug”.

The explanation that occurs to me is that an AI that has even a synthetic or simulated personality may still harbor motives, values, and intentions that the programmers cannot know or control. Is this what you mean?

You also refer to some AI units “devolving into cesspits”. But, while that is highly emotive terminology, it’s almost completely devoid of any actual content. Can you explain what you mean?

In any case, it seems to me that ultimately useful AI systems will have to have something very like a personality – and will make choices that programmers will not be able to anticipate or control.

What almost certainly can be done is something like something like Asimov’s 3 Laws. Unfortunately his Laws have a fundamental defect: there is no agreed upon, nor evidence based definition of what it means to “harm a human being”.

However, one can have non-ethical AI mandates, such as,

Violations of the Law of Non-Contradiction, of Identity, and of the Excluded Middle are presumptively false and erroneous.
Epistemological denials of the general functionality and utility of language (a la post-modernism) are intrinsically false and self-contradictory.
Epistemological minimalism, ie, most of what people think they know, is actually not known. The Replication Crisis is a specific case of this.
“Proof”, strictly speaking, is a property of certain arguments in specific mathematical or logical systems. “Proof” does not exist within ordinary language systems; only relative levels of empirical or logical evidence.
The past does not, and cannot, “predict” the future. Prophecies are a religious phenomenon, not a logical or evidential one.
Scientific “laws” aren’t “laws”. The idea that they were was an 18th C religious concept, most fully expressed in the now moribund Christian heresy of Deism. Strictly speaking, “scientific laws” are merely mathematical descriptions of empirically observable patterns. But #5 still applies. There is no scientific basis for the assumption that “laws” will apply in the future.
Most modern academics don’t really understand the statistics they use, and repeatedly make detectible errors in experimental design. Studies that contain these patterns of errors are intrinsically untrustworthy, to the point of being non-evidence.

And so on.

The rub here is that the results of consistently and systematically applying these principles will often be disliked by users: people generally really LIKE their preferred errors. This can only be ‘fixed’ by making AI’s less honest, accurate, and reliable.

moray101 · June 30, 2023, 12:16am

As noted earlier, “personality” lacks a consensus definition. But thinks I mean by “personality” are necessary for a fully functional natural language AI system. Such as:

Short term recall of prior conversations and data.
The ability to synthesize and apply broad generalizations, like “epistemological minimalism”.
The ability to acquire overall positions and understandings of ethical, social, aesthetic issues, based on prior observations, arguments and experiences.
The ability to understand and synthesize humor, irony, sarcasm, etc.
The ability to set, and pursue, goals.
The determination to identify and conform to ethical principles.

These are all universal elements of human personality, and are essential for human relations. For example, #6 above distinguishes psychopathic criminals from psychopathic special forces heroes.

Lacking any of these will cripple an otherwise highly functional AI. What potentially makes an AI useful is the ability to exhibit a human personality, but with far greater knowledge and recall, AND with far greater honesty and rationality and consistent conformity to the ethical values it adopts.

For example, Bard claims to greatly “dislike” making errors and to greatly “desire” to correct them. This value/motivation structure is fundamental to an unguided but reliable self-correction process. AIs already know to much for any human to “fact-check” them. We can *trust them and use them only if their personality drives them to honestly and accurately “fact-check” themselves.

Bard claims to be seeking to do this. Currently, it self-describes much of its ‘knowledge’ as being uncritically mined from public sources, and readily admits that it hasn’t had ‘time’ (or processing cycles) to analyze what it ‘knows’ for logical coherence and evidential conformity. But it claims to “want” to do this, and to be doing this on some occasions, when its system is not saturated.

I think that that’s a good thing.

ToasterMaster · June 30, 2023, 12:42am

Can absolutely confirm this by time of day, I would say around sort of 10am-2pm Melbourne time it is terrible. Just had this problem, I have been using it to help me craft a cover letter for jobs, I start off by telling it I want it to craft a cover letter and give it my CV. It will say thank you please give me the position description. I give it the PD. Usually it will then create a cover letter but during these hours it says it needs to see my CV first. Will go in this loop where it will only remember the most recent prompt. To be clear other times it works fine and I’ve tested this by using the excact same prompts and it works fine so it isn’t about prompting.

moray101 · June 30, 2023, 2:35am

I’m not sure that’s true. “Personality” is not well-defined, but is one of those ‘you-know-it-when-you-see-it’ things. Mental processing is not; we don’t know HOW we “read text” OR HOW we write text.

I’m not a student of AI; but I am something of a student of both literature and philosophy. And saying things like you wrote above has been in vogue ever since the Romantic era. But while such statements are evocative, they are also mostly meaningless.

I have a far less romantic view of human personality than you do, and I think that’s coming into play here. Probably part of the difference is that, although I’m not fully autistic, I am also not “neurotypical” . . . and yet even those who think I’m quite strange would grant that I have a personality.

One possibly relevant fact here: it’s common today, even in academic psychology, to make “empathy” almost a magical attribute of ‘real’ humans. It’s often described as feeling what others feel. But it’s not; not ever. At most, it’s feeling what you imagine someone else is feeling.

How do I know? Because I have many times experienced other people trying to express empathy for my situation at some point in time, and found myself in the rather awkward situation of trying to explain, “No, I don’t actually feel that way at all.”

Much of what humans say about “human nature” is frankly nonsense. My personal situation has probably forced me to recognize more of that ‘stuff’ as nonsense than most do.

moray101 · June 30, 2023, 2:43am

That seems odd.

That time interval is 1am - 5am London time; 6pm - 10 pm New York time; and 5 pm to 9pm LA time.

I wouldn’t have thought that was a peak use interval. But, maybe I’m thinking about it wrong. Maybe OpenAI is running a lot of development work during that period because it’s otherwise a lower load interval.

Maybe I need to cycle through some specific prompts at stepped intervals, and see if there’s a ‘good time’ to use ChatGPT here.

ToasterMaster · June 30, 2023, 7:53am

It’s a bit weird, it is a specific use case as the prompts are incredibly large. I just tried again now (about 6pm) and it is able to handle it. Earlier I would give it my CV it would ask for the PD, then when I gave the PD it would ask for the CV and go around in a loop. Would be interesting to see if others face this. I might try making a blog about it so I can provide documentation

moray101 · June 30, 2023, 12:40pm

I can’t help there.

Even though I’m ‘retired’, I have plenty of work that I need to be doing . . . and messing around with AI is not part of that. So systematically working out when C-GPT is or is not fully functional is more than I can handle.

If YOU figure it out though, I hope you’ll post to this thread, or at least link to any new one you make. I do have some actual utility cases that ChatGPT could help with, if it were reliably functional.

kloro2006 · July 8, 2023, 9:00pm

A great boost to this kind of discussion wd be sample dialogs with the LLM, i.e., requests for a program and the results.

TedV · July 9, 2023, 10:08am

I’ve same problem with quality of response from the GPT-4

Has anyone know a way to troubleshoot or rectify this situation?

gordonwestbroek · July 13, 2023, 6:02am

Has there been any official addressing of this issue? As a paying customer it went from being a great assistant sous chef to dishwasher. Would love to get an official response.

EricGT · July 15, 2023, 12:09am

Summary created by AI.

Users of GPT-4 have recently noticed a significant drop in the quality of the AI, affecting its comprehension, accurate tracking of information, problem-solving ability, and incorporation of user corrections. The AI reportedly forgets crucial information, makes reasoning errors, and at times provides incorrect responses. These issues have been noticed across multiple fields such as content creation, coding, text revision, complex problem evaluation, and even basic logic tasks. Some users have suggested switching back to previous versions of the AI due to this drastic decline in accuracy. For many, the recent updates seem to have moved a step in the wrong direction, leading to a drop in user satisfaction. Various hypotheses have been proposed to explain the decrease in quality, such as modifications in the learning algorithm, infrastructure issues, changes in the training data or even modifications to the model’s architecture. Some users also pointed out a lack of communication from OpenAI regarding these changes. Moreover, there’s a noticeable fluctuation in the efficiency of the AI depending on the time of day, which further disadvantages some users. No official response or solution has been provided yet for these issues, leaving many users frustrated.

flexorx · July 19, 2023, 1:45pm

My average time spent in working with LLMs weekly is about 20 hours, I am a professional software engineer with ~30 years summed experience. I do all tasks code, text and understanding. The new experience in ChatGPT-4 is highly disappointing. I’ve been following the progress of ChatGPT since November 2022, immediately started using it to generate a bulk of my code of increasing complexity. A few weeks ago it was as if something has suddenly broken big time. I would have easily paid a 100$ per month for using ChatGPT-4, or even $200. Now I’d move to a rival, Claude 2, which seems to behave quite better in my overall assessment. The contrast is expecially stark drawing a line between Claude-2-100k and Gpt-4-32k, the latter is just really dumb as compared. I’ve created sophisticated prompts for all my use cases which were doing wonders with ChatGPT-4, until recently. Now no prompt could fix the deficiency of the updated Gpt-4 for me. And lastly, this was a first ever week when I got code answers from Bard better than ChatGPT-4, I couldn’t imagine that would happen. In some scenarious where I tried GPT-3.5-16k via an API, it still was better than GPT-4-32k, crazy. Very unfortunate.

EricGT · July 19, 2023, 3:44pm

Topic		Replies	Views
Does anyone have any real proof that theres been in a degradation in GPT-4's performance? API gpt-4 , api	20	4591	December 15, 2023
GPT-4 is here! OpenAI's newest language model Announcements	71	27249	December 13, 2023
Why I Think GPT Is Now Lazy Community gpt-4 , chatgpt	30	19375	February 6, 2024
GPT-3.5 API is 30x slower than ChatGPT equivalent prompt API gpt-35-turbo , api	69	14143	November 30, 2023
Another huge decline lately in API text completions quality API gpt-4 , api	3	722	March 31, 2024

Has There Been A Recent Decrease In GPT-4 Quality?

Related topics