Has regular gpt-4 model changed for the worse by any chance?

avataraustin · June 5, 2024, 8:37pm

I have an app that has been using gpt-4-1106-preview as the model for generations for many months now. The results from prompt’s iterations have been refined for better and better results over that time. I tried gpt4o in this app and had to switch back because of poor results. When I ran the app today with the usual gpt-4-1106-preview model, I got unusually poor results without any changes to prompts.

There was an article subject of “feedback loops” and I told the model to do the usual summarize into 2 paragraphs. gpt-4-1106-preview instead gave me 3 paragraphs. Annoying but no big deal. The same model with its usual prompt was then used in the program to generate some summary statements and bizarrely it came up with the term: “feedback loaves” rather than feedback loops. Loaves were not listed or mentioned anywhere else in my prompts, text, etc.
Text generated:

**The Double-Edged Sword of Feedback Loaves**: Understand that positive feedback loops can amplify behaviors unsustainably, while negative feedback loops maintain equilibrium. Create balance by introducing checks that promote sustainable growth.

If I had not proofread this it could be quite embarrassing. Just curious if anything is changed involving gpt4 models for api use as needing to personally read all my generated content looking for these kinds of embarrassing additions really cuts down on the productivity benefits.

Diet · June 5, 2024, 8:50pm

Did you maybe accidentally tweak your sampling parameters? You might get better results with lower temp and top_p

avataraustin · June 5, 2024, 9:08pm

I did not. The app used openai python package and where the chat completion code is used I do not even touch temperature or top_p anywhere in my code. I guess I wonder if Openai would change the defaults for this without me making any changes?

Diet · June 5, 2024, 9:20pm

Sometimes openai does indeed ninja edit the models. But more often than not it’s people using a generic endpoint like gpt-4-turbo which frequently gets changed to a whole new model (currently gpt-4-turbo-2024-04-09) which then breaks a bunch of stuff.

All that said, it looks like they’re trying to retire 1106, and azure is planning some swicheroos on the 10th… maybe there’s already some unanounced stuff happening with openai, seeing if they can get away with stuff… who knows.

fsa · June 6, 2024, 8:57pm

I’ve also noticed bizarrely poor performance with the code output in ChatGPT 4 over the past 2 days. I needed a sanity check from someone else so thanks for making the post.

Here’s an example of the nonsense it’s outputting:

Load data from both paths

df1 = pd.read_csv(path1, skiprows=4, header=0, delim_whitespace=True)
df2 = pd.read_url(path2, skiprows=4, header=0, delim_whitespace=True)

randomly using read_url for no reason

and other various bizarre word changes and poor logic performance overall.

althea_ko · June 10, 2024, 2:15pm

I have some saved results from the older version of 4 and I did a test and compared outputs by 4o and 4 latest version and they were so much WORSE compared to before the 4o release. It’s mind boggling. I’ve been using the same prompts but after the release it’s like ChatGPT has completely gotten “stupid”.

fsa · June 11, 2024, 2:21am

Yeah it’s frustrating. They’re trying to optimize and cut costs, no way around it, but their willingness to allow the quality of the product degrade is setting them up for a Netflix style hate cycle.

GPT4 performance seemed to pick up a little bit over the weekend (the devs definitely do notice performance issues) but still putting out spectacular fails.

An example from today:

6. Report Your Findings

Prepare a report or presentation that communicates your findings and recommendations clearly and effectively to stakeholders. Include:

Data Visualizations: Charts, graphs, and maps that illustrate key trends and comparisons.
Executive Summary: A high-level overview of your findings.
Detailed Analysis: Provide a comprehensive breakdown of your data analysis and contextual research.
{“editor.keyCommands.focusSidebar”, “focusTheSidebar”}

7. Update Periodically

zapstarsolutions · June 12, 2024, 4:22pm

I am facing this same issue, exactly as the original post described, with gpt-4-1106-preview.

Anyone know of any resolution or explanation from OpenAI? Timeline to fixing?

I am receiving random words hallucinated in my responses. For example, I define the JSON response with a specific set of keys. And now, GPT will use an incorrect and non-sensical word for one of those keys randomly.

I’ve been using this prompt for months and literally never had it mishandle the JSON response like this.

zapstarsolutions · June 13, 2024, 2:16pm

I have another example of this happening. My prompt clearly specifies this sort of JSON response, where the key names are explicitly defined in the prompt.

{"key1": "", "key2": "", "key3": ""}

And it just now returned this instead:

{"cripple": "", "key2": "", "key3": ""}

Has anyone had any luck getting a response from OpenAI on this?

fsa · June 13, 2024, 7:02pm

OpenAI probably won’t be responding (not in their business interests frankly) and this unreliability with GPT4 has finally pushed me into trying Claude3 opus (paid version) for code completion. So far it’s is refreshingly willing to give long code output and is reminiscent of (and perhaps better than) GPT4 from a few months ago.

They seem to have a tools mode as well, if you find success with it at a higher frequency than GPT4 that would be a welcome data point.

vikingunleashed · June 13, 2024, 9:49pm

I’ve been having problems with gpt getting worse and worse over the past 2 weeks with coding tasks in particular and came here seeking answers. I cant even use it for bughunting at this point. At least I’m not the only one having issues, but that’s even worse IMO

ethan4 · June 17, 2024, 11:13pm

Same issue here. I have been running the same tasks with the same prompts and gpt-4 for months, and over the past 2 weeks (ish) it seems that there has been a significant drop in the quality of responses. I’m getting gpt-3.5 quality responses using gpt-4.

timwillhack · April 23, 2025, 1:53pm

they need to allow us (especially when using the api) to control the quantization quality or whatever they are doing that changes things. I’m getting sick of writing a bunch of code that utilizes a models ability to have it all of a sudden stop working on the same model. I feel like their hidden system prompt might also be a culprit, i wish I could just turn off their hidden system prompt because I feel like the more instructions it gets the weaker it ends up when we go to use it and extend its rules. This is driving me to want to just run a local model or like deep seek from a cloud provider that doesn’t mess with lowering quality via optimization like this.

Topic		Replies	Views
GPT 4o mini took a hit ever since o1 was released API gpt-4	10	917	September 18, 2024
Another huge decline lately in API text completions quality API gpt-4 , api	3	694	March 31, 2024
GPT-4 becoming dumber sometimes, for a while API	7	2676	December 18, 2023
Open AI APIs responses becoming random Community gpt-4 , api	3	805	April 28, 2024
Loss of Quality responses API gpt-4	2	132	January 26, 2025

Has regular gpt-4 model changed for the worse by any chance?

Load data from both paths

6. Report Your Findings

7. Update Periodically

Related topics