Another huge decline lately in API text completions quality

So, about 5 months ago, I posted a thread wondering what in the world was going on with gpt-4 completions.

After the lion’s share of replies told me how I was wrong and needed to improve my prompts, about one week later, OpenAI announced their recognition of the “laziness”, released gpt-4 turbo model, and subsequently released the gpt-4-0125-preview model to address various concerns.

Here I am once again, wondering, what in the world has happened in the last week to month-ish, where the quality of the api completions has suddenly tanked yet again. This time, it’s clearly the worst it’s EVER been, for coding, and for anything else.

For coding specifically, the code I’m getting is essentially scrap. All the gpt-4 models are, what I would deem as, near to the point of being straight broken.

It modifies my import statements with no outside knowledge or reference, inexplicably breaking them. If troubleshooting bugs, more bugs WILL be created, invariably. Or the issue won’t be addressed at a much higher rate than in the past. If I say “Don’t do X”, I have about 5% faith that it will actually not do X. The gpt-4-1106-preview model in particular is utterly bad, while the gpt-4 model is pretty bad and is imo the most notable performance decrease. I can no longer rely on implied bits of information in my prompts. Everything has to be spelled out so linearly, as if the model is an autistic Aristotle.

Overall? It just gets things SO wrong! Much more frequently than I’ve ever seen it. I would say outside of pre-March 2023, this is the dumbest I’ve ever seen GPT.

Interestingly, the completions for the heavyweight gpt-4 model (gpt-4-0613) is considerably faster than gpt-4-1106-preview, which is extremely odd and did not used to be the case. I don’t know if this is just my experience or what. I almost thought it was a rendering problem, but I have determined it is definitely not. It is the models returning chunks at different speeds only.

I don’t know what to do or think or… yeah. I’m at a loss. I’m wondering if this is just me “again” or if anyone else is suddenly over the last 1-4 weeks experience major quality issues.

Is this all perhaps a gear-up for gpt-5 release? Crunching down on internal usage for completions to make room for a massive model swap?

3 Likes

Seems like the issue here is that you expect consistency from something that is inherently inconsistent. So yea, it is a you problem. Your observations are not necessarily incorrect, though your expectations are.

1 Like

I think it’s a bit nonsensical to be unable to discuss quality of LLMs on the grounds that they are non deterministic.

What I’m talking about is consistent, significant degradation of all models’ quality (except for 3.5turbo, gpt-4-32k, and legacy models, which I do not test) beyond the normal “oh this one reply was bad.”

And if that’s what you’re replying to, well, ok sure, but I’m more discussing the objective quality of the model, not the ethics of expecting gpt-4 from summer 2023 to never change.

1 Like

Of course you would expect consistency and quality. That’s the whole point of improving the AI’s outputs and what any AGI providers strive for. Problem is, ChatGPT /API has seen massive degradation and lazy outputs, especially for intricate or complex prompts. I have been making prompts, writing, coding, designing, etc. using many AGI providers, including chatGPT/API, and this OpenAI has seen huge decline especially in the late 2023 to 2024 and to as of now (March 2024).

2 Likes