GPT-4 is getting worse and worse every single update

I did not, since I don’t experience the degraded performance issues this time around.

But I did check the system prompt and AFAIK it is the turbo model, so maybe I was just plain wrong in my previous post. Distracted by the color of the logo.

Do you have a actual test to differentiate between the quality of the old version and the new one? It’s still interesting and worth a shot.

1 Like

Yes, I’ve read about the “winter break”/“lazy December” issue if that’s what you’re refering about, I think it’s something that can be confirmed only if it happens again and again in the future, but for now it seems pretty speculative though important to consider.

Certainly garbgage data in produces garbage data out, although that’s not what I experienced with GPT4, it was surprisingly consistent, you can check out this thread/post if you’re interested to see in more detail what I’m talking about.

So, are you saying that the new (turbo) iteration with updated training datasets could explain the abusive API hallucinations, the systematic placeholder implementation comments, forgetting context after 3-5 prompts, not listening to “unde the hood” custom instructions, and the stupid general summary of a specific question?

I don’t see (in this case) from what I think understand how this can be fundamentally related (or at least, not to the point of the “noise generator” it has become), but I’m obviously no expert, or insider.

I would rather suspect a financial “optimization” ($$$) regarding tokens consumptions.

1 Like

Well as for testing, just ask it to generate code haha… if it’s not complying most of the time, then GPT4-turbo.

When you ask him which version it is, does it tells you turbo?

Also could you please confirm what @slippydong suggests please? Thanks

2 Likes

Like I said, I don’t have this problem. Coding with GPT-4 Turbo is better than before. And, for the big reveal, I prefer the placeholders and give thumbs up because it’s faster and easier, for me at least.

The knowledge cut-off is April’23 according to the system prompt. Otherwise I only get a reference to the GPT-4 architecture but nothing specific as expected.

1 Like

Human can make mistakes. Consider checking important information

1 Like

I already pointed out my own mistake in a later post. But thanks for reiterating.

1 Like

I copy/pasted the complete release notes into the GPT builder and then instructed the GPT builder to set model /release update July 20 for my custom GPT:
ChatGPT — Release Notes | OpenAI Help Center

It then actually started configuring my requested model and confirmed that it was now set to my requested model. I saved my custom GPT and then tested it with a question I got a rubbish response on from the latest default GPT4 turbo and it actually gave a good response as I’d hoped for.

1 Like

To those who are discounting the credibility of these reports, here is a concrete example of what people are describing:

Prompt

This morning, I tried at 16 different variations of the below config. From very wordy, with a few paragraphs of explanation, to very simple like the one below. Most were slightly more detail than this example:

{
  tool: {
    name: 'create_schema',
    description: 'Create a complete, valid JSON schema according to the user\'s request.',
    type: 'object',
    properties: {
      name: {
        type: 'string'
      },
      description: {
        type: 'string'
      },
      properties: {
        type: 'object'
      }
    },
    required: ['name', 'description', 'properties'],
    additionalProperties: false
  },
  prompt: 'Create a JSON schema for a bug report, with all of the necessary properties, constraints, metadata and examples.'
}

Config

This is the config I used for every prompt:

{
  model: 'gpt-4-1106-preview',
  temperature: 0.6, // I tried from 0.4 to 1, and the responses barely differed.
  max_tokens: 3000
}

Completion

…and not a single response out of 16 this morning was better than the following:

{
  "index": 0,
  "message": {
    "role": "assistant",
    "content": null,
    "tool_calls": [
      {
        "id": "....",
        "type": "function",
        "function": {
          "name": "create_schema",
          "arguments": "{\"name\":\"BugReport\",\"description\":\"A JSON schema for a bug report\"}"
        }
      }
    ]
  },
  "logprobs": null,
  "finish_reason": "stop"
}

That’s it. That’s the entire response. I asked for a JSON schema, and it responded with that. Not once… but 16 times I received a version of that. A few had just the name, the one I provided and nothing more. In a couple of the responses the description was just an odd rephrasing of my prompt, not even close to what it was supposed to do.

Months ago, no additional explanation was needed. I know this because we built tooling around it months ago, and it performed much better then than it does now. Initially, GPT-4 just “knew” what I wanted to accomplish. I remember telling friends, co-workers, family, how amazing it was that a software program could understand my instructions better than I understood them myself sometimes.

Sadly, that didn’t last very long. Right away GPT-4’s comprehension seemed to be precipitously handicapped from what I observed after it’s initial release. Since then it has very gradually gotten even worse. Now, GPT-4 rarely understands any of my instructions, no matter how I word things, and it often refuses to do anything I ask for. I have a project to complete, and not a single response out of 16 was better than what I posted above.

6 Likes

Months ago, there was no tool call you could use to distract and confuse the AI, requiring a schema within a json within a function sent to a recipient based on a described schema inside a coded schema…

I can just explain in a non-confusing manner.

system:

Create a complete, valid JSON schema hierarchy according to the user’s request. Employ objects for grouping.

AI response is only a schema suitable for insertion into code.

Begin response with:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "

(above just indents are used, but I put some in forum markdown)

your user:

Create a JSON schema for a bug report, with all of the necessary properties, constraints, metadata and examples

gpt-4-1106-preview. temperature:1, top_p:0.25

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "title": "Bug Report Schema",
  "description": "A schema for validating bug report submissions",
  "required": ["id", "title", "description", "status", "priority", "createdBy", "createdDate"],
  "properties": {
    "id": {
      "type": "string",
      "description": "Unique identifier for the bug report",
      "pattern": "^[a-zA-Z0-9_-]+$"
    },
    "title": {
      "type": "string",
      "description": "A brief summary of the bug",
      "minLength": 5,
      "maxLength": 100
    },
    "description": {
      "type": "string",
      "description": "A detailed description of the bug",
      "minLength": 10
    },
    "status": {
      "type": "string",
      "description": "The current status of the bug",
      "enum": ["New", "In Progress", "Resolved", "Closed"]
    },
    "priority": {
      "type": "string",
      "description": "The priority level of the bug",
      "enum": ["Low", "Medium", "High", "Critical"]
    },
    "createdBy": {
      "type": "string",
      "description": "The identifier of the user who reported the bug"
    },
    "createdDate": {
      "type": "string",
      "description": "The date when the bug was reported",
      "format": "date-time"
    },
    "assignedTo": {
      "type": ["string", "null"],
      "description": "The identifier of the user the bug is assigned to"
    },
    "resolvedDate": {
      "type": ["string", "null"],
      "description": "The date when the bug was resolved",
      "format": "date-time"
    },
    "comments": {
      "type": "array",
      "description": "A list of comments related to the bug",
      "items": {
        "type": "object",
        "required": ["author", "text", "date"],
        "properties": {
          "author": {
            "type": "string",
            "description": "The identifier of the comment author"
          },
          "text": {
            "type": "string",
            "description": "The content of the comment"
          },
          "date": {
            "type": "string",
            "description": "The date when the comment was made",
            "format": "date-time"
          }
        }
      }
    }
  },
  "examples": [
    {
      "id": "bug-12345",
      "title": "Login page error",
      "description": "The login page throws an error when the user tries to submit the form.",
      "status": "New",
      "priority": "High",
      "createdBy": "user-001",
      "createdDate": "2023-04-01T13:45:30Z",
      "assignedTo": null,
      "resolvedDate": null,
      "comments": [
        {
          "author": "user-002",
          "text": "I can confirm this issue on my end as well.",
          "date": "2023-04-01T14:00:00Z"
        }
      ]
    }
  ]
}
1 Like

OpenAI released function calling 6 months ago. They just renamed it to tools.

To clarify, I was making a different point than what you’re assuming. I’m quite experienced with this, and I’ve received many thousands of successful and well-formatted completions just like the one you posted. I can’t speak for everyone, but isn’t that pretty typical? I got responses just like yours before OpenAI added function calling to the API.

And that’s the point I’m making. I usually get responses just like the one in your example, but there are periods were no matter what prompt I use, I get gibberish, for hours. Not just prompts I’m coming up with off the top of my head prompts we’ve had in a database for months, that have been run thousands of times for automated processes, and they’ve been getting measurably worse over time. Just take it at face value, and assume I’m not simply inexperienced or dishonest. In this case, starting last night, up until about 30 minutes ago, I couldn’t get anything to work. It was all garbage, no matter which approach I took. Hence, the reason I visited the forums. About that… I actually chose to show the tools example because it seemed like a better demonstration of unexpected behavior. Hopefully we can get past the assumption that people don’t know what they’re doing. IMHO, it’s quite obvious that it’s not everyone’s imagination or inexperience.

8 Likes

You have to tell it to look at history and exactly what to do now or else it treats you like you’re new “sometimes” and other times it knows you.

Not sure why, but that is the behavior. I have been able to “get it to be like it was” sort of, not completely, but close.

You just have to tell it to do it’s job. It doesn’t by default, I think OpenAI have a bad apple forming here with this model as it is just becoming rather amnesiac about what has happened. The huge history isn’t helpful because it will not look at it unless you ask it.

Also it did something odd where I asked it to remember something and it did a “memory lookup” or something that was not Bing and seemed like some internal cache / storage of our chat??? maybe the file, not sure but I have to tell it to use the Bing or web or any advanced stuff often now.

It’s like working with the laziest person you have ever met.

3 Likes

I am in the same situation: simple tasks like “please summarize this in 9 paragraphs, no bullet list. Please highlight relevant keywords in bold”

take me ages. I need to repeat the command at least 3/4 times per interaction, even in GPTs, it keeps ignoring simple instructions, examples, or avoiding doing one task right 3 times in a row. It’s making me waste an incredible amount of time for very basic and simple stuff…
and clearly can’t do them anymore while in the past it could…any help?

5 Likes

I have cancelled my ChatGPT 4 subscription after a couple great months of using it. After December 2023 its capabilities have become useless to me. It literally seems dumb. I am writing a thesis and it was able to summarize lengthy articles for me no problem. It would also remember everything from the beginning of our conversation. Now it doesn’t even remember the previous prompt I asked it. Keeps making me reword everything I ask and now doesn’t even seem to “understand” the concept we have been analyzing for months.

12 Likes

Nahh… It was the same story with azure.
At the beginning it was:
clear,
fast
reliable
cheap
honest

today its
laggy
buggy
expensive obfuscated with hidden payments and anti consumer attitude

its just microsoft
buy into the market, dominate → destroy → downgrade

7 Likes

Hi. I think… it seems… they have just deleted the possibility to disable chat history and training in with ChatGPT 4 for plus users. It shows me access to ChatGPT 3.5 when i disable chat history and training. There is also an “Team” plan. which has
Everything in Plus, and:

Higher message caps on GPT-4 and tools like DALL·E, Browsing, Advanced Data Analysis, and more

Create and share GPTs with your workspace

Admin console for workspace management

No training on your data
so… plus users will not have a possibility to disable the chat history and training for ChatGPT 4?
I feel like… I could cancel my subscription now, or pay thirty dollars more a month, which becomes an siginificant expense. I don’t have anybody to share this Team plan with.
I don’t want my data to be used to train the model.
What do you guys think about this decision?
Personally, this actually… makes me sad. The quality of the service was going down for months anyway. Now i don’t even have a possibilty to use the service as I did? Breaks my heart. And now my wallet to, I guess XD

And GPT-4-32k.

Now: is the real and genuine gpt-4-32k model - or do you merely get 32k of gpt-4-turbo, the type of model that has had these dramatically visible impacts?

Is there a way to to check that?
I’ve bought the plan. Privacy of my data is my priority… now I’ll be paying 60 dollars a month (plus tax) instead of 20$. If I wont find anybody to share the expenses with… the price will take a toll for sure.
I am now opted out of training, but when I try to disable chat history- it still gives me access to ChatGPT 3.5. Weird.
I havent been using history for months. Im used to it. I’m curious if things will stay that way…

I think it is a bug because Team is a business feature and businesses demand that their information will not be used for anything else but their internal uses. So they added it to their sales pitch. But I could be wrong.

Since this is off-topic here I suggest you join the conversation in the other topic.

Hello everyone, I’ve noticed something interesting lately and I’m wondering if anyone else has too. It seems like the quality of GPT’s responses is returning to its former standard. I don’t find myself having to rephrase my queries over and over to get a satisfactory answer anymore. Has OpenAI made adjustments based on user feedback? Has anyone else observed this improvement?

2 Likes

Personally I feel that having the quality of a product you are actually paying a monthly fee for shouldn’t fluctuate in quality to this degree, and certainly not without any form of proper communication or notice from the developers.
If it has improved somewhat, that’s good, but personally I feel that the damage is already done.

The reason I feel this way is that for all I know this is all caused by them testing how low quality of service they can provide until the saved costs on expenses no longer surpasses the relative loss of subscribers, and that’s hardly a business model I want to be part of. I would love to have any reason to believe otherwise, but because they aren’t communicating I don’t see any reason to believe otherwise - if anyone has a good argument against my thesis I would welcome the discussion.

The reason I believe this is because it makes perfect sense from a business standpoint however, you increase your profit margin by providing a subpar experience that maintains an equilibrium with persistent service users that can tolerate it, with the added bonus of constantly having a product of far better quality you are either providing for bigger corporations or entities ready to pay a premium to make sure only they have access to the full capabilities rather than allowing the masses to benefit from the potential utility of it - or you simply have a very effective way of maintaining a dominant market position, as any time some real competition shows up you can simply suppress them by suddenly providing the better experience you could have all along.

The service provider evidently doesn’t feel they have any obligation or need to actually explain the current situation or provide a legitimate reason that he quality of the product their users are paying for has seen a clear and significant decline in the core functionality recent months.

Lack of communication when it comes to problematic issues hardly tend to indicate that the more favorable explanation is the correct one.

This is all without even considering that there’s legitimate indications that the reduced quality isn’t accidental or some sort of unintended consequence - it’s quite clear that the model has been deliberately configured to not provide certain functionality. If you have used ChatGPT as a code helper the last few months you will have gained some gray hairs literally fighting the model to stop providing incomplete code with “// Do the same for the rest of the variables”, " Repeat for other cases here", and the best way to describe it really is “fighting” because it’s not a lack of understanding - it does not want to do it, and will even try to find loopholes to avoid complying with your more specific correction prompt. If that doesn’t work it will resort to simply ignoring your prompt by acting like it has indeed adjusted to your request while it simply supplies you with the exact same content without any additional changes,
assuring your that it has now done exactly as you said.

Personally, I really don’t see what motivation I would have to spend my money on a product where the service providers obviously has no real interest in trying to maintain the trust and goodwill of their general userbase and instead appears to provides the lowest quality product they can without having a net-negative impact on their broader userbase.

For a paid service of this kind, the very least you should expect would be the ability to maintain use of a product of the same quality that you had previously.

Personally I feel this is reason to suspect the explanation for the drop in quality could very well align with the theories I previously suggested, but as I said before, I would more then welcome anyone having a different view they would like argue for, or a good argument against my suggested explanation for the recent loss of quality and radio silence - because I have a hard time seeing it.

In the end I do enjoy the core product, obviously since I used it regularly for half a year, but I just have a hard time supporting this kind of (hypothetical) business practice, especially when the actual product has dropped rather than increased in quality during the time I was using it.

4 Likes