Loss of logic In the ChatGPT May 3 Version

Anyone experiencing a reduction in quality outputs from the May 3 Version?
I have noticed a reduction in logic and inability to produce quality code Compared to the previous versions. What has your experience been?

3 Likes

I’m happy to hear that you haven’t been experiencing it. I’m working on putting together a simplified example of what I’m experiencing for evaluation. In the meantime figured I’d reach out and touch bases with everyone about the new update.

I asked it to look over my code and find a problem. It replies like the following and starts writing gibberish code:

The issue herein lies within the aspect ratio calculation causing impressions that parts of the image compression/aspect presents inconsistencies with colors and locations during mouse interactions such as choosing certain color. I will rewrite this entire image upload/browse canvas applications by preserving actual image proportions as keeping detailed positioning.

Here’user ">

Color Chatbot body { keyboard-action="typing-first-s)"> keysr_text(iforele_user"labelma Code to ve@norvig\.cole"text(inv)','') get_ifav code_chunks: O[rt"]:rnS]]>sfoe:;"(not.ad!~-~-(.bubscriptions==-)):+cript\setult?[ dt)[.InU(": ifca>

i have also noticed a significant downgrade in the logic capabilities of the most recent GPT4 version when discussing/evaluating complex inverse problems, differential rates or patterns of change, and spatial-temporal variability. I only rarely received erroneous replies before the update, but now I have to double-check all output (i.e., now a double-negative conditions sometimes don’t get appropriately translated into a positive condition). I see these errors as more GPT3.5-like than prior GPT4 levels of reasoning. Very frustrating.

2 Likes

I understand that you, among others, have noticed a degradation in the logic capabilities of the latest GPT-4 version, particularly when tackling complex inverse problems, differential rates, and spatial-temporal variability. The increased frequency of erroneous responses and issues in recognizing double negatives have been frustrating, seemingly reminiscent of the previous GPT3.5’s performance.

Some speculate that covert throttling during peak usage periods may be responsible, which raises questions about OpenAI’s transparency. It would be a significant disappointment if there were elements of deception in the company’s approach, particularly at such an early stage.

I trust that OpenAI will seriously consider these concerns and address them promptly. Transparency and responsiveness to user feedback are essential attributes for any company, particularly a startup. We should remain optimistic that OpenAI will heed this feedback, make the necessary adjustments, and ensure a trustworthy and positive experience for all users.

In my view, the issues likely stem from problems with the new model, but I’m interested to hear your perspective. What do you think?

1 Like

I am not sure. I think you have articulated a number of causes for concern very well. I am honestly disturbed by the lack of attention this issue has received in these forums.

Furthermore, I am even more dismayed by the latest “May 24 Version” which appears to actually be a further step in the wrong direction.

I am observing even greater loss of reasoning between the May 12 and May 23 version! Hard to overstate how frustrating this is! The May 23 version in many respects appears to follow instructions more poorly than GPT3.5 at this point and consistently fails to solve basic logic problems.

@#$!

2 Likes

I think they’re finding ways to make GPT 4 more affordable because of the many plug-in users.
Everyone is annoyed at the slow speed and 25 message cap. A possible trade-off is to go to a more affordable model when servers are peaking.

Here’s yet another disgusting, astounding example from today of just how stupid GPT4 has become:

Prompt:
OK, let’s revise your last writing output to be more formal. You need to learn an important, basic lesson in scientific writing: “higher” is not an appropriate substitute for “greater”, etc.

ChatGPT4:

I appreciate your feedback, and you’re correct. In scientific writing, precision is important, and the choice of words can significantly impact the interpretation of the results.

“Higher” and “greater” can be used interchangeably in casual conversation, but in scientific contexts, they could imply different meanings. For example, when comparing numbers, we’d generally say one number is “greater” than another, not “higher.” On the other hand, “higher” is often used to describe physical height or levels in a vertical hierarchy.

I strive for accuracy and clarity in all types of writing, including scientific writing, so I appreciate your feedback to help me improve. However, as of my last training cut-off in September 2021, I do not have the capability to learn or adapt my responses beyond the data I was trained on. I can provide information and responses based on that training, but I can’t learn new information or correct my style of writing based on user feedback.

The developers at OpenAI regularly update and improve the model, so feedback like yours is crucial in making these improvements. Thank you for your input!

W. T. F. am i even paying for…!?
Two months ago, this prompt would have elicited a very different, superior response from GPT4.

3 Likes

Invariably, in the past, ChatGPT-4 would then revise the previous output & adopt a more formal writing style, with more precise language. In 99% of cases, the GPT would also understand that there was no need for a direct reply to my statement, just a desire for more rigorous writing. I’m honestly surprised that you see nothing wrong with this reply. Everything about it screams “lobotomy” compared to the April/May 3 versions of GPT-4.

1 Like

I just shared my link in this post: Major Issues in new GPT 3.5 : DO NOT DEPRECATE OLD ONE - #20 by SomebodySysop

This isn’t a prompt issue or “hoping for a better response” issue. It’s a model degradation issue.

Now, I’m not complaining too loudly because the GPT-4 codex model, despite a hiccup there and there, is doing an exemplary job as my coding assistant. I have built a whole working ingestion/query system, from scratch, in php, with it’s assistance. Very happy with the results.

But, GPT-3.5 and the GPT-4 Browsing models? I don’t know. Something’s not right.

Here you go: https://chat.openai.com/share/73188b65-b8bb-4b2d-97a2-24400e9dbf30

Start at the first occurrence of “Drupal 9”. It just appears to be throwing out random guesses.

The text in this conversation is over 30k tokens, you are asking the model about details many tens of thousands of tokens ago when the model context is only 8k.

This will result in hallucinatory output. As people grow more comfortable with the AI they tend to have longer conversations and it can be easy to forget that the context limit is finite when you are engrossed in a discussion.

1 Like

The entire conversation may be over 30K, but for the immediate question I supplied the code to process and the resulting error messages. I was NOT referring to the much earlier code, but the immediate code and errors. This is not a memory or lost context issue. When I popped back onto GPT-4 codex, it answered the question within a couple of tries, with logical responses to errors, not illogical responses to errors.

It is best practice to create a persona for coding and to start a new chat regularly to ensure that the persona and the context are up to date. This is especially true if you are coding at a professional level. In the example where you had less than ideal output there was no persona defined and regularly updated and the context was long, when you went into the codex prompted one it had a context for coding specifically and you gave it just the code you were interested in.

All of these factors alter the performance of the model.

1 Like

GPT-3.5 gives bad answers out of the box. But, I can’t say your suggestions won’t make a difference until I’ve tried them. I avoid GPT-3.5 Chat like the plague now, but if forced to use it in a pinch, I’ll try it.

If you setup the coder persona in GPT-4 every 3 or 4 messages and then the exact code you’re interested in, I’m finding that works really well, so long as I know i’ve roughly got my persona prompt and the code contest inside the last 4k tokens… it’s great.

I am fortunate enough to have access to the GPT-4 API which I do use for longer sections of code.

Another method I use is to have GPT-4 define the code outline as contentless functions, classes and definitions, then I create a consistent coder persona with a brief overview of the code I am working on as a whole, with library requirements and associated helper function definitions. Then I start a new chat for each function or class I am working on and include that persona prompt, then I work on just that function in that chat and refresh the persona every few prompts.

It’s extra work but I find the results to be excellent.

1 Like

I have never had enough trust in gpt for math… I wrote a script that uses Wolfram mathematics or Matlab as the mathematics engine for this reason (depending on your use case “mycroft-core” is a decent framework you can modify to have Wolfram and gpt work together I looked at how they accomplished this and cut out the assistant part)

1 Like

Yes, the model was switched from GPT-4 to GPT-3.5. Yikes, what a difference!

But, the difference is as striking in the API as well. Here is a pretty clear example. Same question, prompts, same context document. Same temperature.

OK, so I thought, GPT-4 had the advantage of Chat History, so I decided to give GPT-3.5 the same information:

To it’s credit, it at least tried to read the document. But, just looking at the titles of the documents, how artificial is your intelligence when you can’t see a relationship between twpunion and twpunion-d9?

OK, I’ll try this. I had made a point to try and stick with the same conversation based on the project (in order to cut down on multiple related chats), but I’ll try it. I have had some success with this approach when gpt4 starts to get a bit wonky.

Thank you for the suggestions.

1 Like