I have an app that has been using gpt-4-1106-preview as the model for generations for many months now. The results from prompt’s iterations have been refined for better and better results over that time. I tried gpt4o in this app and had to switch back because of poor results. When I ran the app today with the usual gpt-4-1106-preview model, I got unusually poor results without any changes to prompts.
There was an article subject of “feedback loops” and I told the model to do the usual summarize into 2 paragraphs. gpt-4-1106-preview instead gave me 3 paragraphs. Annoying but no big deal. The same model with its usual prompt was then used in the program to generate some summary statements and bizarrely it came up with the term: “feedback loaves” rather than feedback loops. Loaves were not listed or mentioned anywhere else in my prompts, text, etc.
Text generated:
**The Double-Edged Sword of Feedback Loaves**: Understand that positive feedback loops can amplify behaviors unsustainably, while negative feedback loops maintain equilibrium. Create balance by introducing checks that promote sustainable growth.
If I had not proofread this it could be quite embarrassing. Just curious if anything is changed involving gpt4 models for api use as needing to personally read all my generated content looking for these kinds of embarrassing additions really cuts down on the productivity benefits.
I did not. The app used openai python package and where the chat completion code is used I do not even touch temperature or top_p anywhere in my code. I guess I wonder if Openai would change the defaults for this without me making any changes?
Sometimes openai does indeed ninja edit the models. But more often than not it’s people using a generic endpoint like gpt-4-turbo which frequently gets changed to a whole new model (currently gpt-4-turbo-2024-04-09) which then breaks a bunch of stuff.
All that said, it looks like they’re trying to retire 1106, and azure is planning some swicheroos on the 10th… maybe there’s already some unanounced stuff happening with openai, seeing if they can get away with stuff… who knows.
I’ve also noticed bizarrely poor performance with the code output in ChatGPT 4 over the past 2 days. I needed a sanity check from someone else so thanks for making the post.
Here’s an example of the nonsense it’s outputting:
I have some saved results from the older version of 4 and I did a test and compared outputs by 4o and 4 latest version and they were so much WORSE compared to before the 4o release. It’s mind boggling. I’ve been using the same prompts but after the release it’s like ChatGPT has completely gotten “stupid”.
Yeah it’s frustrating. They’re trying to optimize and cut costs, no way around it, but their willingness to allow the quality of the product degrade is setting them up for a Netflix style hate cycle.
GPT4 performance seemed to pick up a little bit over the weekend (the devs definitely do notice performance issues) but still putting out spectacular fails.
An example from today:
6. Report Your Findings
Prepare a report or presentation that communicates your findings and recommendations clearly and effectively to stakeholders. Include:
Data Visualizations: Charts, graphs, and maps that illustrate key trends and comparisons.
Executive Summary: A high-level overview of your findings.
Detailed Analysis: Provide a comprehensive breakdown of your data analysis and contextual research.
I am facing this same issue, exactly as the original post described, with gpt-4-1106-preview.
Anyone know of any resolution or explanation from OpenAI? Timeline to fixing?
I am receiving random words hallucinated in my responses. For example, I define the JSON response with a specific set of keys. And now, GPT will use an incorrect and non-sensical word for one of those keys randomly.
I’ve been using this prompt for months and literally never had it mishandle the JSON response like this.
I have another example of this happening. My prompt clearly specifies this sort of JSON response, where the key names are explicitly defined in the prompt.
{"key1": "", "key2": "", "key3": ""}
And it just now returned this instead:
{"cripple": "", "key2": "", "key3": ""}
Has anyone had any luck getting a response from OpenAI on this?
OpenAI probably won’t be responding (not in their business interests frankly) and this unreliability with GPT4 has finally pushed me into trying Claude3 opus (paid version) for code completion. So far it’s is refreshingly willing to give long code output and is reminiscent of (and perhaps better than) GPT4 from a few months ago.
They seem to have a tools mode as well, if you find success with it at a higher frequency than GPT4 that would be a welcome data point.
I’ve been having problems with gpt getting worse and worse over the past 2 weeks with coding tasks in particular and came here seeking answers. I cant even use it for bughunting at this point. At least I’m not the only one having issues, but that’s even worse IMO
Same issue here. I have been running the same tasks with the same prompts and gpt-4 for months, and over the past 2 weeks (ish) it seems that there has been a significant drop in the quality of responses. I’m getting gpt-3.5 quality responses using gpt-4.