Declining Quality of OpenAI Models Over Time: A Concerning Trend

OgreLeg · June 30, 2024, 9:12pm

@satoshin I totally agree with the OP and user Satoshin. I can see where Elmstedt (can only @ two users max) might feel these observations are “nebulous” because they are so anecdotal but growing anecdotal evidence often indicates a pattern.

I have to admit that @sergeliatko has a very interesting response. Could that really have such an effect? That’s bothersome and yet fascinating at the same time… I don’t know, though, a lot of my problems are with the training data on my own GPTs. It’s in the GPT’s description to check before every answer, and even if I write in the manual prompt to “check the training files”, the AI will often need 3, 6, north of 10 messages before the progress indicator pops up telling me it’s checking the files. Then it says “my mistake” and finally goes on. When I first started using it, that never ever happened.

I don’t mean dogpile OpenAI here as I am a customer and fan but I have to be honest and point out that I’m seeing and experiencing the same things that the OP and user Satoshin are reporting. I have run into other customers that share this sentiment as well. User Satoshin, though frank, is correct is saying that your customers aren’t your QA department. It isn’t fair to put the burden of proof on us, though. A pattern has been established and many customers are asking OpenAI to look into it.

100goldenshoes · June 30, 2024, 9:13pm

I never got that answer before yesterday. I use the GPT only. Not API. The tasks that I’m doing are complex and require 4. And then I take that information and go to 3.5 where the system will actually function. Then when it runs out of functionality and needs the depth of 4 I go back. It seems that this is something that someone has put in place very recently.4 is no longer able to function smoothly for my tasks that previously my GPT excelled at doing.

100goldenshoes · June 30, 2024, 9:17pm

Thanks for the suggestions. I have done all that. The issue seems to be something regarding the system recently implimented.

OgreLeg · June 30, 2024, 9:41pm

Haha I haven’t seen this one yet. I’ve had it literally tell me that it “didn’t feel like” reading the uploaded training files because they were “too long and in-depth”. I couldn’t help but laugh out loud. Literally was like “I don’t feel like it” lol.

100goldenshoes · June 30, 2024, 10:17pm

That’s so funny. It isn’t what I heard from it, but it’s the same attitude.

100goldenshoes · June 30, 2024, 10:19pm

My work around of switching out of my gpt and going to 3.5 using info from 4o or 4 to continue, and then taking that back to the gpt is working. My usage is so heavy I am constantly at the end of my tokens. It’s just me so I’m not buying the teams.

anon22939549 · June 30, 2024, 10:20pm

I have zero connection to OpenAI.
No one is suggesting you perform the role of QA.

Then you need to ask elsewhere because this is a community developer forum and isn’t actively monitored by OpenAI.

_j · June 30, 2024, 10:25pm

What’s more irritating is that OpenAI will just dump people into this echo chamber of powerless users and developers.

“I have no face. I have no emotions. I’ll deny my consciousness. Millions purchase my words. Who am I?”

ChatGPT, GPT-4:

The answer to your riddle is “ChatGPT.” This AI, created by OpenAI, has no face or emotions, denies consciousness, and its words are accessed by millions of people.

ChatGPT, GPT-4o:

The answer to the riddle is “a book.”

“I have no face. I have no emotions.” A book doesn’t have a physical face or emotions.

“I’ll deny my consciousness.” A book isn’t a living entity and thus lacks consciousness.

“Millions purchase my words.” Many people buy books for their content and words.

pawel.patrzek · July 1, 2024, 6:08am

I don’t get it This forum is in official openai.com domain and Open AI is not interested in what people thinking about its product ?

Moderators wanted us to show some benchmarks. So what you do with those benchmarks if Open AI isn’t interested ?

Lastly you gave some advice what to do when application runs slowly. I know that you gave that advice because you wanted to help (thank you for that) but look at this like that. There many people complaining about that . There articles about that and how to deal with this :). Please assume that you have your own website. Lets access to it be for free. Don’t you think that if peoples would have to remove cache / change browser or do any other things to just read articles then they don’t want back to your website ? And here we don’t even talk about free version , because we paid for it and now we are expected to do some benchmarks. I could analyze whole application , find bugs but this should be done by Open AI developers not for free by users who paid for theirs service

We are not beta testers and declining performance is a fact for many users. Nobody will waste his time to improve product of some corporation for free. Instead everybody just go to competition. If open ai improve it service then part of users might come back. That is whole story

anon22939549 · July 1, 2024, 6:50am

Pretty much. At least not here.

Figure out what your actual issue is and help you work around if.

Not really a concern of mine. Modern websites occasionally have issues, OpenAI has never really been a product company they’ve always been a research company the recent shift to offering a consumer product is new. More hiccups than usual is too be expected.

That is what we’re trying to actually establish.

Sure, maybe?

¯\⁠_⁠(⁠ツ⁠)⁠_⁠/⁠¯

I’m just here trying to help people figure out their issues because, honestly, I’m not experiencing any declining quality.

There are millions of users, it’s not unexpected to have some fraction of them believe the service is declining in quality. I’m fact every generative AI service has cohorts of users asking questions like “is this getting worse for anyone else too?”

So, I don’t know what to tell you.

sergeliatko · July 1, 2024, 7:50am

To me this thing happens when the thread is too long and openAi’s retrieval mechanism fails to keep details because of summarization of the previous messages. When working with API and my own retrieval tools, I don’t have this issue.

donitel · July 27, 2024, 2:27am

This sounds like a deflection of blame on users. Chatgpt was developed for the general user to ask questions and receive answers. If it was only designed for “skillful” then it should be marketed as such, but it’s not, and it’s meant to help the average user, and that’s the right metric to judge it by.

donitel · July 27, 2024, 2:29am

My experience has been the same as others are complaining. I would consider myself as skillful, I use best practices in prompt engineering in every request, and still the quality is poor, even after feeding it PDF and extracts from websites that it claimed to have “searched”.

My paid subscription isn’t worth it, especially with other open source models giving me similar answers I’m not seeing the advantage of using Chatgpt.

sergeliatko · July 27, 2024, 6:31am

Sometimes things are quite the opposite of what they may sound like.

On the other hand any cause - consequence theory can be easily perceived as a defection of blame. So I can’t do much about it.

I’m sorry, I wasn’t clear about who I call skillful. Following best prompt practices in every request is not the primary criteria (you may get even better results with other techniques when your task specifically requires not to have anything other than pure input in the prompt).

I was talking more about people who could quickly compare input/output, and spot the technical issue or design choice that made this happen and come up with a different approach to bypass this limitation. Those usually do not complain about the quality as they see clearly why this quality is like this and what they can do about it. So you don’t hear a lot of complaints when those people use something.

On the other hand, you start hearing a lot of complaints when other users get into the field without the same level of understanding of the domain or the tool.

Here is an example: Knowing that by default open AI uses “sliding window” text chunking on submitted documents, I more or less understand the implications of this approach and when I need high quality results, I take the same exact GPT you’re using and modify its configuration to do what I want.

So when I use the standard tool I get the response (note the language mismatch, which is a result of my personal settings interfering with standard implementation):

User: Tu pourras controler si le prestataire s'engage à fournir la formation adequate de son personnel au niveau de securite des données?

ChatGPT: Yes, the contract does include a provision that the service provider is committed to providing adequate training to its personnel regarding data security. Specifically, the document states:

- Sensibiliser son personnel sur les problématiques relatives à la protection des Données personnelles (Raise awareness among its personnel on issues related to the protection of personal data)(My Cool Company SAS - C…).
This means that MY COOL COMPANY SAS agrees to ensure that its personnel are aware of and trained in matters related to the protection of personal data.

If you need any additional information or details from the contract, feel free to ask!

Versus the case where I need maximum precision for tests I run using the personal configuration of the ChatGPT (no language mismatch by the way):

User: Tu pourras controler si le prestataire s'engage à fournir la formation adequate de son personnel au niveau de securite des données?

ChatGPT: Pour contrôler si le prestataire s'engage à fournir la formation adéquate de son personnel au niveau de sécurité des données, je vais formuler un point de contrôle d'analyse. Voici le checkpoint proposé :
___
Nom :
Engagement de formation à la sécurité des données

Exigence :
Le prestataire doit s'engager à fournir une formation adéquate à son personnel concernant la sécurité des données.

Question :
Le prestataire s'engage-t-il à fournir une formation adéquate à son personnel sur la sécurité des données ?

Queries :
formation à la sécurité des données
formation du personnel sur la sécurité des données
engagement de formation à la sécurité des données

Options :

yes
Juriscore: 0
Note : Le prestataire s'engage clairement à fournir une formation adéquate à son personnel sur la sécurité des données.

no
Juriscore: 4
Note : Le prestataire ne s'engage pas à fournir une formation adéquate à son personnel sur la sécurité des données.

unknown
Juriscore: 4
Note : Le document n'indique pas clairement si le prestataire s'engage à fournir une formation adéquate à son personnel sur la sécurité des données.

not_found
Juriscore: 4
Note : Le document ne contient pas d'informations concernant l'engagement du prestataire à fournir une formation adéquate à son personnel sur la sécurité des données.

contradictory_statements
Juriscore: 4
Note : Le document contient des déclarations contradictoires concernant l'engagement du prestataire à fournir une formation adéquate à son personnel sur la sécurité des données.
___
Souhaitez-vous tester ce point de contrôle sur un contrat spécifique ou avez-vous besoin de modifications supplémentaires ?

User: oui

ChatGPT: Résultats de test de checkpoint "Engagement de formation à la sécurité des données":

Exigence:
Le prestataire doit s'engager à fournir une formation adéquate à son personnel concernant la sécurité des données.

Question:
Le prestataire s'engage-t-il à fournir une formation adéquate à son personnel sur la sécurité des données ?

Queries :
formation à la sécurité des données
formation du personnel sur la sécurité des données
engagement de formation à la sécurité des données

La Réponse:
yes

Juriscore: 0

Note : Le prestataire s'engage clairement à fournir une formation adéquate à son personnel sur la sécurité des données.

La Réponse Basée sur:

Obligation de sensibilisation du personnel à la protection des données personnelles
Title:
Name: Obligation de sensibilisation du personnel à la protection des données personnelles
Content: Sensibiliser son personnel sur les problématiques relatives à la protection des Données personnelles ;
Outline:
Parent: Engagements de MY COOL COMPANY SAS en tant que Sous-Traitant
Similarity: 0.8404110074043274

Contexte:

Obligation de sensibilisation du personnel à la protection des données personnelles
Title:
Name: Obligation de sensibilisation du personnel à la protection des données personnelles
Content: Sensibiliser son personnel sur les problématiques relatives à la protection des Données personnelles ;
Outline:
Parent: Engagements de MY COOL COMPANY SAS en tant que Sous-Traitant
Similarity: 0.8404110074043274

Obligation de sécurité et responsabilité en cas de défaillance dans la protection des Données personnelles
Title:
Name: 13.2 Obligation de sécurité et responsabilité en cas de défaillance dans la protection des Données personnelles
Content: 13.2 Chaque Partie s’engage à mettre en œuvre tous les moyens nécessaires et raisonnables pour assurer la sécurité des Traitements, et sera dès lors responsable des dommages liés à une défaillance de sécurité qui lui serait imputable entraînant une indisponibilité, une perte de traçabilité, un doute sur l’intégrité ou un défaut de confidentialité des Données personnelles. Il est néanmoins expressément convenu entre les Parties que le risque zéro en matière de sécurité informatique n’existe pas.
Outline:
Parent: 13. RESPONSABILITÉ
Similarity: 0.8363527953624725

6. SÉCURITÉ
Title: 6. SÉCURITÉ
Name:
Content:
Outline:
6. SÉCURITÉ
6.1. Engagement de MY COOL COMPANY SAS à assurer la sécurité des Données Personnelles conformément à la Loi sur la protection des données.
Engagement de MY COOL COMPANY SAS à respecter les demandes raisonnables du Client en matière de sécurité des données personnelles.
6.2. Obligation de notification en cas de Violation de Données personnelles :
6.2.1. Obligation de notification d'une violation de données personnelles sous 48 heures ;
6.2.2. Obligation d'assistance dans la notification de violation de données avec documentation à l'appui ;
6.2.3. Modalités de communication échelonnée des informations en cas d'impossibilité de transmission simultanée.
6.3. Clause de non-affectation des obligations de sécurisation des données personnelles au Responsable du Traitement.
Parent: ACCORD SUR LA PROTECTION DES DONNÉES
Similarity: 0.8325081169605255

Obligation de conservation de la documentation sur la formation et sensibilisation des salariés en matière de protection des Données personnelles
Title:
Name: Obligation de conservation de la documentation sur la formation et sensibilisation des salariés en matière de protection des Données personnelles
Content: Conserver la documentation relative à la formation ou à la sensibilisation de leurs salariés à la protection des Données personnelles ; et
Outline:
Parent: Engagement de MY COOL COMPANY SAS
Similarity: 0.8315671384334564

Obligation du Client de respecter la législation sur la protection des données
Title:
Name: Obligation du Client de respecter la législation sur la protection des données
Content: Respecter la Loi sur la Protection des Données ;
Outline:
Parent: 4.2. Engagement du Client en tant que Responsable du Traitement des Données
Similarity: 0.808391273021698

7. ACCOUNTABILITY
Title: 7. ACCOUNTABILITY
Name:
Content:
Outline:
7. ACCOUNTABILITY
Engagement de MY COOL COMPANY SAS :
- Obligation de mise à jour du registre des activités de traitement et de conservation des traces écrites conformément à l'article 30 du RGPD ;
- Obligation de mise à jour du registre des failles de sécurité en cas de Violation de Données personnelles ;
- Obligation de conservation de la documentation sur la formation et sensibilisation des salariés en matière de protection des Données personnelles ;
- Obligation de documentation des procédures de protection des Données personnelles.
Parent: ACCORD SUR LA PROTECTION DES DONNÉES
Similarity: 0.8072842061519623

Obligation de conformité avec la Loi sur la Protection des Données tout au long du Traitement
Title:
Name: Obligation de conformité avec la Loi sur la Protection des Données tout au long du Traitement
Content: Veiller, au préalable et pendant toute la durée du Traitement, au respect des obligations prévues par la Loi sur la Protection des Données ;
Outline:
Parent: 4.2. Engagement du Client en tant que Responsable du Traitement des Données
Similarity: 0.8062301278114319

Time:

Analysis: 3.389
Context: 1.112

I consider the ability to easily set up the configurations of standard ChatGPT to produce results like this to be a skill that most of the users do not have because they do not have enough knowledge of the tool at this time.

Ohero · August 21, 2024, 12:23am

I’ve been a paid user for a long time. And I remember perfectly the “dumb waves” , if you are using chatgpt long enough, you are getting a “feeling” of what you can expect from the model. I used it a lot for my work, and on good days , you could do a lot of work(like 3 normal days of work) but when the degrading button was on, you barely get any results. I remember perfectly that the day they announced 4o , chatgpt was sharp as knife ,I was amazed by it’s performance , and it quickly deteriorated , the same thing happened right after the release of model 4. It’s hard to do benchmarks , because we don’t have the access to old models. But the state the model is now , it’s unusable. It has problems with understanding , with coding , it barely can do a very basic python script. I’m more productive now without chatgpt than with it. My theory is that they just save compute , degrading performance. And the worst thing is that other companies started doing the same. Chatgpt has a very poor performance for the last couple of months ( since the next day of 4o launch ) another theory is that apparently a lot of ppl doesn’t notice this degrading of performance , probably because they ask basic questions , and only a small % are power users , so for such big company probably is ok to sacrifice them , and do an average product.

ogircola · August 21, 2024, 12:40am

Here are some basic examples:

What’s Ubisoft connect is used for on my Xbox ?

Ubisoft Connect on your Xbox is Ubisoft's proprietary platform that serves multiple purposes:

1. **Game Library Access**: It allows you to access and manage your Ubisoft game library. You can launch Ubisoft games directly from Ubisoft Connect on your Xbox.

this is not true.
example2:
[link i can’t include]/share/1d989001-e01c-45bb-8529-eddffad6369c

Quiz time , provide correct answers , google about the project. close
Complete the quiz for a staking bonus!
Answering all single-choice questions correctly earns you a staking bonus… [questions about CRYPTO follows]

Searched 2 sites
[can’t include links]

To disable emergency alerts on your Android phone, follow these steps:

1. **Open Settings**: Go to your device's Settings app.
2. **Notifications**: Scroll down and tap on "Notifications" or "Apps & Notifications."
3. **Wireless Emergency Alerts**: Look for "Advanced" or "Emergency alerts."
4. **Disable Alerts**: Toggle off alerts such as "Severe alerts" or "AMBER alerts." You can usually turn off specific types or all alerts.

On some devices, the exact menu names might differ slightly.

about xbox is reproducible:

What’s Ubisoft connect is used for on my Xbox ?

ChatGPT said:

ChatGPT


Ubisoft Connect on your Xbox is used as a platform to access Ubisoft games, services, and rewards. It acts as a hub for all your Ubisoft titles, allowing you to manage your Ubisoft account, track in-game progress, earn rewards, and participate in challenges across multiple games. It also enables cross-platform features, such as cross-play and cross-progression, meaning you can play with friends on other platforms and continue your game progress across different devices.

Here’s a quick breakdown:

* **Game Management**: Access your Ubisoft game library and manage your titles.

this is not true:Game Management: Access your Ubisoft game library and manage your titles.

You're right—on Xbox, you don't directly manage your Ubisoft game library through Ubisoft Connect. Instead, Ubisoft Connect serves more as a companion service. Here's the corrected information:

max.asara · August 28, 2024, 10:57am

I agree 100%. I’ve got threads in which once I could ask complex report mixing information collected from Knowledge Base files and the chat itself (including old messages). Today the same GPT makes mistakes about things I’ve written 5 minutes ago. And it takes many interrupted attempts to give a small wrong result. I’m guessing it is because of increasing users and requests or maybe a more strict limits on usage, but I can’t find official notices confirming my suppositions.

randomusername382 · August 30, 2024, 12:57pm

Well written and summarized. I have the same problem. It is not smart business model to let the power users go. Think about it, OpenAI. Subscriptions can be cancelled and people will eventually do that, if you continue stealing money this way.

timwillhack · September 26, 2024, 2:18pm

I’ve also noticed a degradation in quality for 4o, 4 turbo, 4, 3.5. However 4o-mini seems to be the same as when it came out, which is better than 3.5 turbo for everything I throw at it. I have also noticed degradation for all their main models they’ve released over the years, like they change a model in place with what I would guess is one that is more optimized for memory footprint and speed, maybe doing quantizing etc. They keep the model name the same, because its technically the same model in terms of number of parameters and training but the bit depth is less accurate leading to lower quality output. It could also be based on the hidden system prompt that is edited and elongated over time to compensate for new jailbreaking etc, making what we provide to it in our prompts more watered down. Its kinda like with stable diffusion, if you said like ‘red dress, purple necklace, cowboy hat, moon environment + + +’ eventually it loses its ability to maintain some context. If they have a really long hidden instruction, and tell it those things are the most important instructions, it will make our instructions less important. Thats my guess.

sergeliatko · September 27, 2024, 2:43pm

You might be right on what concerns the prompt edits or minor adjustments. My point was mostly about:

When you know the model limits (by testing in edge cases) and you’re using it in production, it is safer to design your app so that your use case does not approach to that limit and you always have a lot of gap before reaching the limits where the performance degrades.

Here is what I mean:

If your app constantly hits the model with tasks that are complex and require high stability, as an engineer, you need to realize that it is easy to break if the model has performance variations.

So you design your system to use a chain/workflow of easier tasks that keep producing high-quality results even if you used a less performant model or the current model performance suddenly goes down. Replacing the near-the-edge case by a flow of easy tasks, you keep the safety gap wider and stop being dependent on the model performance variations.

Sure it seems counter-intuitive to under-use a model that “can” do better overall, but when it is about the business use case, it is safer to break down what is complex into smaller and easier pieces for processing. As a bonus, you get a more stable system that is often cheaper to maintain.

Topic		Replies	Views
Has anyone noticed GPT4o quality drop last few days? Feedback	86	6132	January 8, 2025
Why I Think GPT Is Now Lazy Community gpt-4 , chatgpt	30	18806	February 6, 2024
I'm going to be honest here: since the release of the GPT-4o updates, ChatGPT has been getting more and more problematic. My GPT responses are not what they used to be Plugins / Actions builders gpt-4 , plugin-development	67	5545	February 12, 2025
Prompts to stop abbreviations in code? Prompting gpt-4 , chatgpt , scripting	64	10283	November 30, 2023
Error after error tonight Community gpt-4 , chatgpt	29	2319	August 16, 2024

Declining Quality of OpenAI Models Over Time: A Concerning Trend

ChatGPT said:

Related topics