Do you experience worse response quality in long context-heavy conversations?
Yesterday, on December 3rd, ChatGPT experienced a partial outage. Today, I’m continuing work on the same javascript project and the responses are unusable.
Specifically, the issue lies in partial ignoring of instructions that didn’t used to happen before. This appears to be happening on both models that I use a lot: 4o and o1-preview.
Quick list of issues:
- New hallucinations: Typically, the models are fairly reliable at maintaining key context (stack, for example). Today, models list the stack and propose obviously incompatible solutions in the same response. Simplest example would be recommending the use of Buffer on edge runtime.
- Hallucinations overpower instructions: Next step is that once corrected on a specific issue like the one mentioned above, it continues to make the same mistake nearly 100% of the time. I’m seeing this level of “reliability” for the first time in the latest models.
Have you experienced the same issue?
Please share prompting tricks that help mitigate this if you’ve found some already