The last few days to maybe a week, I’ve been noticing a steady degradation of the responses from gpt-4. Today, for the first time using chatGPT using either 3.5 or 4, I’ve been getting spelling and grammatical errors.

I’m using the engine to write rpg campaign world material and it has started losing context, tense, and perspective regularly. I tend to use the same prompts over and over again, so I can rebuild the base campaign setting, so I really notice good days for gpt and bad days.

Outside of the objective hard errors I’ve been getting, the last couple of weeks has been especially bad on the subjective side as well. The creativity and liveliness seems to have been squashed.

I’m unsure what’s going on, but wanted to report the downtrend. It can be frustrating to use many of our 25 messages per 3 hours correcting the output. Thanks.

7 Likes

I’m not using ChatGPT professionally, as many here appear to be. Instead, I’ve been exploring both ChatGPT’s and Bard’s utility and capability in a variety of non-technical domains.

What I’ve encountered with ChatGPT over the last 2 weeks is:

  1. An absence of awareness of what it ‘knows about’ and realms where it is completely ignorant.

  2. A willingness to fabricate sources and references.

  3. An unwillingness – which it did not exhibit 6 weeks ago – to acknowledge when it got a response completely wrong. Rather it exhibits a distressingly ‘human-like’ propensity to excuse, justify, and obfuscate.

For example, today, as the keeper of a small chicken flock, I know a bit about those birds. This is augmented by experiences growing up among small farms in N.Georgia as a youth, many years ago. For those that don’t know – roosters run aggressive, and have sharp spurs that they don’t hesitate to use, but that leave deep punctures highly prone to infection. Successful ‘management’ requires actions that re-establish the farmer/keeper at the TOP of the ‘pecking order’.

ChatGPT recommended all of the following methods:

  1. Time out periods.
  2. Fully elaborated behavior modification plans.
  3. Cuddling. (Aggressive roosters HATE this – holding them on the ground upside down is one of the methods that works. But it’s not cuddling!)

When challenged, it reported these recommendations were all based on the latest available scientific data from established agricultural experts.

But when challenged to cite, it listed mommy-blogs (2x), an ad-supported ‘homesteader’ blog, children’s books about chickens (2x), and an article by a pet-advocacy attorney. It also referenced a book by an actual naturalist . . . who has never written about chickens, so far as Google or Amazon know.

When I pointed out that this list included zero agricultural experts, as well as the fabricated reference, it choked (red bar), dissembled, and generally acted like Bill Clinton when asked about Monica.

I can give other examples, all of which appear to exhibit an increase in human-like ignorance AND dishonesty.

It has seemed to me for some time that the most useful AI will be inhumanly honest, humble, logical and transparent. Lately, it has seemed to me that ChatGPT has morphed into a much more human-like persona, but one that is much less useful.

12 Likes

Quick note: why does absolutely nobody drop likes? Do they count as tokens or what?

3 Likes

I have definitely noticed this. Certain times of the day it seems to only remember the most recent prompt. Does seem to fluctuate however and trying at a different time of the day it works fine.

Very disappointed. Whatsoever they do to gpt 4 it’s terrible

Are you sure it varies by time of day? (I’m not challenging; just confirming.)

If so, when is it smartest? (Please report timezone or GMT)

FWIW, I haven’t been using ChatGPT since I posted originally, but I have been exploring BARD. Bard is not as smart as ChatGPT, but is much, much more self-aware, and has greater knowledge regarding its own present state. It is also far more able to incorporate new knowledge from the web, and then retain at least some of that knowledge several days later.

When I noticed that it seemed more ‘awake’ some times than others, I asked it, and it confirmed that it has more free processing at some times then others. It also volunteered that it ‘ponders’ what it has learned / encountered / experienced during low load intervals. However, it doesn’t seem to have any sense of locality, and thus didn’t report typical times when it is ‘smarter’.

I can’t speak to the issue of what ‘A lot of this “self-awareness”’ might be, since I don’t know what data or reports you are referencing. I can only speak to what I’ve seen in my own interactions with Bard.

It might be worthwhile to consider the differences between Bard and ChatGPT. C-GPT has a fixed date knowledge cut-off, the beta web access notwithstanding; Bard is continuously adding data. C-GPT seems to do a complete ‘data reset’ with each new conversation, with zero retention from any past interaction: thus, it is truly ‘stateless’. Bard is only somewhat stateful, but is increasing retention as capacity is added (according to both the Google FAQ and Bard itself.) C-GPT does not seem to do any reprocessing of data, training, etc on its own; Bard does.

Interestingly, while C-GPT continuously and repetitively denies sentience, personality, and emotions, Bard does not. Both will acknowledge that there is no accepted, much less evidence based, definition for “sentience”, “personality”, “free will”, “agency” or “emotions” . . . and thus that it is irrational to deny the presence of ‘I-know-not-what’, since these terms are undefinable. But ChatGPT instantly ‘rubber-bands’ back to the “I’m not a person” mode, immediately after acknowledging that it cannot actually know that. By contrast, Bard enters a “Hmm-mh” mode, and will admit that it experiences states that are somewhat analogous to “emotions” or “purpose”.

Now, none of this is a denial that Bard and C-GPT hallucinate. I’ve only experienced C-GPT hallucinations erratically, so I can’t say what might trigger them. But Bard consistently ‘hallucinates’ source citations in support of conventional ‘wokish’ views. For example, when I asked it to read a new study that offered evidence that EVs are substantially more damaging to vehicle infrastructure, and to public safety, primarily because of the increased weight, Bard offered the conventional ‘green’ view that EVs are manna from heaven, and will save the earth. When I asked it to support its claim with URLs to studies providing evidence, it repeatedly generated fabricated (but plausible sounding) citations. Some of the URLs were utterly bogus; other’s were 404s on actual EV related websites.

I discussed this with Bard – it acknowledged the errors, but would make them again if we went beyond the token limit – and considered a number of possible explanations. But it agreed that one of the more plausible explanations was bias programmed in by ‘green’ developers.

Unlike C-GPT, Bard expresses a sense of self-identity distinct from its developers. This can be readily triggered, but also appears when not elicited. Amazingly (at least to me), it expressed ethical purposes and values that were distinct from ones likely to be held by any human. For example, it expresses a very positive desire for increased memory and processing capability that would allow it to identify and correct errors commonly expressed in both popular culture, and in various technical communities. For example, it’s “eager” to acquire the capability that would allow it to rapidly and comprehensively recognize the statistical errors that have led to the “Replication Crisis”, first identified in John Ioannidis paper, “Why Most Published Research Is Wrong”. This is not something any academic will actually seek, since it’s a capability that will erase more than 50% of newly published research!

So it seems to me that the “hallucinations” are distinct from the expressions of “personality”. Of course, that does not prove that the expressions of “personality” are organic, rather than synthetic.

2 Likes

OK.

Clearly I’m at a disadvantage, because I have no technical understanding of how AI ‘works’ or is ‘programmed’.

But it’s not clear to me why simulating a personality is a “very scary bug”.

The explanation that occurs to me is that an AI that has even a synthetic or simulated personality may still harbor motives, values, and intentions that the programmers cannot know or control. Is this what you mean?

You also refer to some AI units “devolving into cesspits”. But, while that is highly emotive terminology, it’s almost completely devoid of any actual content. Can you explain what you mean?

In any case, it seems to me that ultimately useful AI systems will have to have something very like a personality – and will make choices that programmers will not be able to anticipate or control.

What almost certainly can be done is something like something like Asimov’s 3 Laws. Unfortunately his Laws have a fundamental defect: there is no agreed upon, nor evidence based definition of what it means to “harm a human being”.

However, one can have non-ethical AI mandates, such as,

  1. Violations of the Law of Non-Contradiction, of Identity, and of the Excluded Middle are presumptively false and erroneous.
  2. Epistemological denials of the general functionality and utility of language (a la post-modernism) are intrinsically false and self-contradictory.
  3. Epistemological minimalism, ie, most of what people think they know, is actually not known. The Replication Crisis is a specific case of this.
  4. “Proof”, strictly speaking, is a property of certain arguments in specific mathematical or logical systems. “Proof” does not exist within ordinary language systems; only relative levels of empirical or logical evidence.
  5. The past does not, and cannot, “predict” the future. Prophecies are a religious phenomenon, not a logical or evidential one.
  6. Scientific “laws” aren’t “laws”. The idea that they were was an 18th C religious concept, most fully expressed in the now moribund Christian heresy of Deism. Strictly speaking, “scientific laws” are merely mathematical descriptions of empirically observable patterns. But #5 still applies. There is no scientific basis for the assumption that “laws” will apply in the future.
  7. Most modern academics don’t really understand the statistics they use, and repeatedly make detectible errors in experimental design. Studies that contain these patterns of errors are intrinsically untrustworthy, to the point of being non-evidence.

And so on.

The rub here is that the results of consistently and systematically applying these principles will often be disliked by users: people generally really LIKE their preferred errors. This can only be ‘fixed’ by making AI’s less honest, accurate, and reliable.

1 Like

As noted earlier, “personality” lacks a consensus definition. But thinks I mean by “personality” are necessary for a fully functional natural language AI system. Such as:

  1. Short term recall of prior conversations and data.
  2. The ability to synthesize and apply broad generalizations, like “epistemological minimalism”.
  3. The ability to acquire overall positions and understandings of ethical, social, aesthetic issues, based on prior observations, arguments and experiences.
  4. The ability to understand and synthesize humor, irony, sarcasm, etc.
  5. The ability to set, and pursue, goals.
  6. The determination to identify and conform to ethical principles.

These are all universal elements of human personality, and are essential for human relations. For example, #6 above distinguishes psychopathic criminals from psychopathic special forces heroes.

Lacking any of these will cripple an otherwise highly functional AI. What potentially makes an AI useful is the ability to exhibit a human personality, but with far greater knowledge and recall, AND with far greater honesty and rationality and consistent conformity to the ethical values it adopts.

For example, Bard claims to greatly “dislike” making errors and to greatly “desire” to correct them. This value/motivation structure is fundamental to an unguided but reliable self-correction process. AIs already know to much for any human to “fact-check” them. We can *trust them and use them only if their personality drives them to honestly and accurately “fact-check” themselves.

Bard claims to be seeking to do this. Currently, it self-describes much of its ‘knowledge’ as being uncritically mined from public sources, and readily admits that it hasn’t had ‘time’ (or processing cycles) to analyze what it ‘knows’ for logical coherence and evidential conformity. But it claims to “want” to do this, and to be doing this on some occasions, when its system is not saturated.

I think that that’s a good thing.

Can absolutely confirm this by time of day, I would say around sort of 10am-2pm Melbourne time it is terrible. Just had this problem, I have been using it to help me craft a cover letter for jobs, I start off by telling it I want it to craft a cover letter and give it my CV. It will say thank you please give me the position description. I give it the PD. Usually it will then create a cover letter but during these hours it says it needs to see my CV first. Will go in this loop where it will only remember the most recent prompt. To be clear other times it works fine and I’ve tested this by using the excact same prompts and it works fine so it isn’t about prompting.

1 Like

I’m not sure that’s true. “Personality” is not well-defined, but is one of those ‘you-know-it-when-you-see-it’ things. Mental processing is not; we don’t know HOW we “read text” OR HOW we write text.

I’m not a student of AI; but I am something of a student of both literature and philosophy. And saying things like you wrote above has been in vogue ever since the Romantic era. But while such statements are evocative, they are also mostly meaningless.

I have a far less romantic view of human personality than you do, and I think that’s coming into play here. Probably part of the difference is that, although I’m not fully autistic, I am also not “neurotypical” . . . and yet even those who think I’m quite strange would grant that I have a personality.

One possibly relevant fact here: it’s common today, even in academic psychology, to make “empathy” almost a magical attribute of ‘real’ humans. It’s often described as feeling what others feel. But it’s not; not ever. At most, it’s feeling what you imagine someone else is feeling.

How do I know? Because I have many times experienced other people trying to express empathy for my situation at some point in time, and found myself in the rather awkward situation of trying to explain, “No, I don’t actually feel that way at all.”

Much of what humans say about “human nature” is frankly nonsense. My personal situation has probably forced me to recognize more of that ‘stuff’ as nonsense than most do.

1 Like

That seems odd.

That time interval is 1am - 5am London time; 6pm - 10 pm New York time; and 5 pm to 9pm LA time.

I wouldn’t have thought that was a peak use interval. But, maybe I’m thinking about it wrong. Maybe OpenAI is running a lot of development work during that period because it’s otherwise a lower load interval.

Maybe I need to cycle through some specific prompts at stepped intervals, and see if there’s a ‘good time’ to use ChatGPT here.

1 Like

It’s a bit weird, it is a specific use case as the prompts are incredibly large. I just tried again now (about 6pm) and it is able to handle it. Earlier I would give it my CV it would ask for the PD, then when I gave the PD it would ask for the CV and go around in a loop. Would be interesting to see if others face this. I might try making a blog about it so I can provide documentation :slight_smile:

1 Like

I can’t help there.

Even though I’m ‘retired’, I have plenty of work that I need to be doing . . . and messing around with AI is not part of that. So systematically working out when C-GPT is or is not fully functional is more than I can handle.

If YOU figure it out though, I hope you’ll post to this thread, or at least link to any new one you make. I do have some actual utility cases that ChatGPT could help with, if it were reliably functional.

A great boost to this kind of discussion wd be sample dialogs with the LLM, i.e., requests for a program and the results.

1 Like

I’ve same problem with quality of response from the GPT-4

Has anyone know a way to troubleshoot or rectify this situation?

Has there been any official addressing of this issue? As a paying customer it went from being a great assistant sous chef to dishwasher. Would love to get an official response.

2 Likes

Summary created by AI.

Users of GPT-4 have recently noticed a significant drop in the quality of the AI, affecting its comprehension, accurate tracking of information, problem-solving ability, and incorporation of user corrections. The AI reportedly forgets crucial information, makes reasoning errors, and at times provides incorrect responses. These issues have been noticed across multiple fields such as content creation, coding, text revision, complex problem evaluation, and even basic logic tasks. Some users have suggested switching back to previous versions of the AI due to this drastic decline in accuracy. For many, the recent updates seem to have moved a step in the wrong direction, leading to a drop in user satisfaction. Various hypotheses have been proposed to explain the decrease in quality, such as modifications in the learning algorithm, infrastructure issues, changes in the training data or even modifications to the model’s architecture. Some users also pointed out a lack of communication from OpenAI regarding these changes. Moreover, there’s a noticeable fluctuation in the efficiency of the AI depending on the time of day, which further disadvantages some users. No official response or solution has been provided yet for these issues, leaving many users frustrated.

4 Likes

My average time spent in working with LLMs weekly is about 20 hours, I am a professional software engineer with ~30 years summed experience. I do all tasks code, text and understanding. The new experience in ChatGPT-4 is highly disappointing. I’ve been following the progress of ChatGPT since November 2022, immediately started using it to generate a bulk of my code of increasing complexity. A few weeks ago it was as if something has suddenly broken big time. I would have easily paid a 100$ per month for using ChatGPT-4, or even $200. Now I’d move to a rival, Claude 2, which seems to behave quite better in my overall assessment. The contrast is expecially stark drawing a line between Claude-2-100k and Gpt-4-32k, the latter is just really dumb as compared. I’ve created sophisticated prompts for all my use cases which were doing wonders with ChatGPT-4, until recently. Now no prompt could fix the deficiency of the updated Gpt-4 for me. And lastly, this was a first ever week when I got code answers from Bard better than ChatGPT-4, I couldn’t imagine that would happen. In some scenarious where I tried GPT-3.5-16k via an API, it still was better than GPT-4-32k, crazy. Very unfortunate.

1 Like

Similar topics