Why sometimes ChatGPT (and Custom GPT) looks "dumb": missing and broken context

This weekend I have been experiencing some issues with ChatGPT which are likely only temporary. However, a somewhat random observation gave me some insight on why sometimes the model looks particularly broken, dumb or forgetful.

The main goal of this post is to highlight this specific failure mode / bug that makes ChatGPT’s responses look “dumb”, “forgetful” and “confused”, but which has a simple, understandable reason which you can go and check.

tl;dr: If ChatGPT is behaving particularly badly and inconsistently, reload the current conversation and check the transcript. You may realize that the transcript is missing or is truncating some messages. If so, the conversation is broken; start a new one. This does not necessarily fix the issue (the new conversation might also break), but at least you know what’s going on.

To be clear, this post doesn’t cover all failure modes, just this one in particular.

The problem

The problem pertains the user interface and handling of the context window. Specifically, the ChatGPT interface shows what looks like a full answer, but internally ChatGPT only receives /stores truncated answers or even misses entire bits of conversation.

From the User perspective, it looks like ChatGPT is behaving stupidly, ignoring the context or the User commands, repeating itself, forgetting previous interactions, etc. But ChatGPT’s behavior makes a lot of sense considering what it can actually see in terms of the context it is fed.

Arguably, this issue has nothing to do with the internals of the LLM not working or being tampered with (e.g., attention, number of experts, etc.) or OpenAI intentionally nerfing the model to balance the load, etc. Instead, it seems it’s some “simple” network or database error in saving ChatGPT’s responses, and evidently it is a bug and not intentional.

How do I know what’s going on?

After having some extenuating conversations using a Custom GPT whose behavior I know extremely well (I have been playtesting this extensively), I logged out, logged in again, and went back to check the transcripts.

The transcripts arguably report what ChatGPT actually “saw”, i.e. received as context (as opposed to what was printed on my screen). What the transcript showed is a bunch of truncated or missing answers. For example, some of my answers (which at the time were interleaved with ChatGPT’s responses) are clumped together, entirely missing pieces of the conversation on the GPT’s side.

Example:

I paste an example below, from a Custom GPT I am working on (an interactive fiction game).

If you look at this screenshot of the transcript, it looks like I interrupted the GPT while it was listing a bunch of scenarios and picked the “French resistance” setting, which does not appear on the GPT answer.

However, that’s not what happened on my screen at the time of the interaction (unfortunately, I have no screenshot of that). The GPT answer was not truncated at the time of execution: it listed six scenarios (as it should), one of which was the French resistance thing, which is the one I ended up picking.

Also, an entire GPT message is missing from the transcript, the message in which the GPT presented me with multiple characters from the French resistance setting, and my answer after that message was “Lucien”. Notice how my two separate responses are now clumped in a single message which says:

French Resistance

Lucien

In hindsight, this is what the GPT saw for the purpose of continuing the conversation, as opposed to the full answers which were given to me.

The subsequent conversation has the GPT looking dumb, asking me about the setting or Lucien multiple times, and trying to continue as well as it could having no idea of what I was talking about. Point is, ChatGPT is very good at “winging it”, so it was not immediately clear that e.g. it was missing entire bits of the conversation, but responses just looked slightly odd and incoherent. In short, we had a confused GPT and a very frustrated user (myself).

With the evidence above, I believe the reasons for the GPT behavior in this situation is clear. It’s not really a problem of the model being nerfed or whatever, but just some “trivial” database or network issue, and obviously a bug.

PS: The fact that you see “3/3” in the screenshot above is because I went back and retried multiple times, given the incoherent responses; all threads show the same issue of truncated / missing messages, which I didn’t know at the time as on my screen I could see fully-formed answers, if increasingly confused.

Take-home message

If you feel that ChatGPT is behaving particularly weirdly and inconsistently, reload the current conversation. You may realize that ChatGPT was missing some crucial messages. At that point, your best bet is to start a new conversation (although if the network/database failures are still ongoing, you might experience the same issue again soon).

2 Likes

For the past two days I have found chatgpt basically unusable. I just wanted some simple changes to a html file with a little javascript addition. Forgets all the time about my requirements and completely destroys what it was supposed to do each time.
I hope this is very temporary.

1 Like

Thanks. I edited my post to make it clear that I am giving a concrete check for the underlying reason (if it is the same reason). If you go back and check your transcripts, you should see a bunch of broken conversations, i.e. ChatGPT was misbehaving because it was not actually receiving/storing all the messages.

If that’s the case, I believe it is temporary in the sense that it is clearly undesired behavior, some sort of network or database issue, and it will be fixed in one way or another.

I’m concerned that the issue I’ve encountered might be temporary. It seems that following a specific incident involving the “tips issue”, GPT-4’s performance has dipped, particularly in its precision and its tendency to over-rely on code for solving problems. When tasked with verifying the equality of two mathematical expressions, a foundational requirement, it appears to unnecessarily complicate matters by focusing on variables that don’t need specific values to resolve the equation. Moreover, unless explicitly instructed not to use code, it defaults to computational approaches, which might not always be the most effective method for such problems. I’ve noticed that in comparison, GPT-3.5 demonstrated a more robust capability in handling algebraic simplifications and proving equivalences between mathematical formulas. The current iteration, GPT-4, seems to struggle with basic algebra and resorts to shortcuts that undermine the essence of mathematical proof. I’m hopeful for a prompt resolution to these issues, enhancing GPT-4’s ability to apply mathematical concepts more adeptly and accurately.