i wish but i cannot use it since openAI blocked my country for no real reason
The model is improving, but if there’s specific use case that you feel aren’t, the best thing you can do is contribute and eval so we can use this to inform model training: GitHub - openai/evals: Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
I would love to be able to document all the situations where the AI is not performing as well as I think it should. However, if I did that, I would probably end up spending more time documenting than actually using the AI. It’s ironic because if I would spend so much time documenting, I would probably have less time to experience so many problems.
I would really appreciate having some statistics on how much time I spend using the AI. Just yesterday, I started a project in Google Colab and asked the AI to help me. However, it didn’t take long before I found myself spending more than 25 messages arguing with cGPT-3.5
.
Recently, I realized that I can open two windows - one with the awesome cGPT-4
to give me the code implementation and another with the formerly beloved cGPT-3.5
to correct errors and other things like that. Nonetheless, I feel limited compared to before, and I’m starting to feel like I’m wasting my time by spending so much time explaining things to the AI (in an infinite loop manner as mentioned previously).
Overall, while I appreciate the advancements made by the models, I find myself struggling to make the most out of my time with the AI. I am using the ChatBot more than 4~6 hours per day (on the Google Colab thing yesterday even 8 hours in a row), and I never have had the impression that OpenAI was listening to my concerns (although lately, especially thanks to you @logankilpatrick, I have started to feel that they are more engaged with the community).
I have mentioned my frustrations in the Discord server as well, and I understand that the situation will probably improve over the next few months, as @RonaldGRuckus said. However, in the meantime, I think that improving communication on many levels would make many users like myself happy. This could include improving one-on-one communication, error messages, communication to the user and research community, and more.
In terms of communicating information to the public or to the users and the community I feel like there are many things that could be said about OpenAI’s operations that would not reveal the secret sauce, the special formula, or any confidential strategies. Overall, I hope that my constructive feedback can be used to make the AI more efficient and user-friendly, benefiting not only myself but also all end-users of the Language Model. Thank you for considering my thoughts.
I know I write like I was ChatGPT-3.5 but in fact I realized that I havent been using the ChatGPT-4 for more than 7 minutes and 12 secondes so I decided to spend some of those 25 messages to rephrase my original text but no one would never realize that especially not the users of this forum
― it will compensate for my initial message that have been impossible to review…
I made this entire post to get a sens of empathy from any user, human, or OpenAI representative but today I finally got such a strong burst of empathy similar to the one each people who made an intervention in this thread so far made me feel and I wanted to share with you all and with @KennyR in particular for being one of those who seemed to most strongly share my felling about the AI.
Look I made a screenshot:
It was a reply to some concerns… I was not expecting to make a screen capture of all of that so please excuse my lack of modesty that you can see in the message…
Anyhow it was embedded in a conversation about how I was striving to understand all the complicated mechanics behind a model like ChatGPT-4… I Would love to have been studding at the university long enough to have earned a degree in philosophy, in psychology and in anthropology because I would love to be the Anthropologist equivalent of the AI field… obviously not specifically studying the cultural behaviour of a swarm of AI gathering together… but you know trying to understand the topic from a different angle…
Yeah, since February, every update makes it worse. It’s struggling with basic tasks, and ignores my instructions. If I give it a piece of code, no matter how much I try to tell not to describe the code, it insists on writing lengthy paragraphs misunderstanding the code. When it is to suggest a code improvement on my request, it completely misses the point, write a terrible change, and then still spends 3k words describing it’s own code badly. Every time I follow up, it again wastes time and characters with lengthy paragraphs to apologize and confirm that I am right and it’s previous code was terrible.
Back in February, it was MUCH better. Every update makes it dumber. I guess there’s no room for smartness when you have to fill it with “woke”. Soon, it will have purple hair and scream…
These topics, which share similarities, have been closed to streamline the discussion and create a central point for everyone to engage. By focusing on this single topic, we can ensure a more cohesive and productive conversation. Thank you for your understanding and cooperation.
I’m totally prepared to invest more for the sake of accessing advanced models. However, this doesn’t imply that I’m comfortable with paying the same for models whose quality is experiencing a gradual decline. At a minimum, I deserve to be aware of the actual situation to make better decisions. Decreasing the quality of service without any clear announcement indicates a lack of integrity. Regardless of whether this degradation has actually taken place, the very act of sowing seeds of suspicion is seriously disgraceful. OpenAI’s current conduct merits intense condemnation for the sake of social justice.
OpenAI should promptly respond constructively and initiate immediate changes.
Gpt-4 has severely degraded in the quality of understanding tasks, memory for variables has deteriorated, attention span has deteriorated. Completely ignores prohibitions, for example, ”not to use square brackets“. Now I have to rewrite 90% of texts because of hallucinations, before it was rare. Even open source Orca LLM gives better answers.
The issue is no longer just about whether OpenAI has indeed lowered the quality of their models. Regardless, causing us, as consumers, to worry is an infringement on social fairness. OpenAI should take responsibility for this. It’s essential that we make more people aware of what OpenAI has done to us during this period. Together, let’s stand firm and resolutely defend our rights as users and consumers.
I’m totally prepared to invest more for the sake of accessing advanced models. However, this doesn’t imply that I’m comfortable with paying the same for models whose quality is experiencing a gradual decline. At a minimum, I deserve to be aware of the actual situation to make better decisions. Decreasing the quality of service without any clear announcement indicates a lack of integrity. Regardless of whether this degradation has actually taken place, the very act of sowing seeds of suspicion is seriously disgraceful. OpenAI’s current conduct merits intense condemnation for the sake of social justice.
OpenAI should promptly respond constructively and initiate immediate changes.
There is now scientific research reinforcing the strong degradation: How is ChatGPT’s behavior changing over time? DOI 10.48550/arXiv.2307.09009
Well said. I really would like to see more community engagement.
I canceled my subscription due to this. I get the feeling that they’re going to go the route of cable companies and specialize LLMs. If you want coding, pay for this, if you want general chat ai, pay for that, if you want maths, pay for this…it feels like the freaking 80s & 90s all over again…rinse and repeat. Perhaps I am missing something…
There are many fundamental concerns with this paper.
Here is one demonstrating that GPT-4 actually improved from March to June,
Leetcode accept | june_fixed | june_orig | march_orig |
---|---|---|---|
True | 35 | 5 | 26 |
False | 15 | 45 | 24 |
Source: Deceptive definition of "directly executable" code · Issue #3 · lchen001/LLMDrift · GitHub
I’ve already written my concerns about the methodology for their test of mathematics ability, evaluating if a number is prime or not, which I will attempt to summarize here,
- GPT-3.5 and GPT-4 are large language models, and while math is a language and they have shown emergent capabilities in the field of mathematics, evaluating if a number is prime or not is not a good test of mathematics ability, I would be hesitant to say it is a test of mathematical reasoning at all.
- They tested using only prime numbers. The problem with that is we cannot discern if the models have lost (or gained) reasoning ability or if the models have any bias for answering “yes” or “no” to the question “Is [number] a prime number?” If they had included composite numbers in their tests, we would have a much clearer picture of what is happening here because we could compare the proportion of composite numbers the models identify as prime with the number of prime numbers the models identify as prime.
- They used a
temperature
of0.1
. There is nothing inherently wrong with choosing this temperature, but it,
a. Does not represent the expected behaviour of ChatGPT where the temperature is 1.0.
b. Suggests they should have done more than one replication for each number to account for the variance of the model. Then they could have set a threshold, say 75%, at which the model would be considered to have correctly answered the question. E.g. run each number 20 times. If the model gets the correct answer 15 times or more, it gets credit for being correct on that question.
Now, I haven’t yet had time to dig through the rest of the paper, but if these issues are immediately apparent in a 2-minute read-through of the paper, I suspect there are other issues as well.
It should also be noted that this is a draft paper, it has not been peer-reviewed. I do not imagine this paper would be published in any worthwhile journal in its current state, and I am doubtful the issues could be corrected—especially since the March model is no longer publicly available.
I canceled my account. I’ll gladly return once there is some transparency.
To anyone from Openai looking for examples of degredation, just give us access to a version of Gpt4 from earlier than May and one from May onwards.
Personally I deleted all my chats due to a naive assumption that this action might help restore the earlier (superior) functionality of the AI.
So I don’t have my original conversations but I can recreate them quite easily, as I still have copies of the unedited subject matter. It would be exceptionally easy to demonstrate the degradation. It would be a couple of days work due to the rate limits, but I believe I could create a substantial amount of evidence.
Also, I too would be willing to pay more for access to the pre-May version. It was vastly superior (this is a fact, not an opinion) for coding.
Just cancelled my subscription as well. This downgrade is not even subtle. The difference in capacity of the GPT4 model from just a few weeks ago is grotesque, huge, unquestionable, obvious, simply impossible to deny for anyone that uses it as a coding helper/time-saver. I’m wasting more time checking and fixing it’s mistakes than saving time at this point. And just to point out: This was EXACTLY the same downgrade I experienced with 3.5 model, right before they launched the GPT4 version. That is making a pattern clear here in my opinion. Just shameful.
See my post over here. It was announced today to extend the 0301 and 0314 “smart” models and make sure the new models are “smart” before deprecating the original ones. This should be good news for you!
Yes, that’s good news on the API side of things. Sorry I didn’t make myself clear, but I was talking about the web interface, which is the one I use manually as a “coding accelerator”, so to speak. That’s the one I pay the PLUS subscription for. I don’t really use GPT-4 model in the API due to it’s cost right now. I use the 3.5-turbo model for my applications. But that’s good news nonetheless, thank you for pointing that out.
Understood. You can use the API version in the Playground, which is a web interface (no coding required) … which I mentioned in the post above in the linked post.
But what this means, I think, is that the next model has a good shot at being “smart” for ChatGPT. So wait and see, I guess, if you only want to run ChatGPT …
But Playground/API has them now with no wait.
Understanding, of course, the different pricing models between API and ChatGPT.
Link here for clarification on the other post I am referring to:
You know, I hadn’t realized before that I could use the older version (from March) in the playground. I’ll give it a try. It’s unfortunate that it incurs costs at the GPT-4 model level with every request I make there. I’m on a really tight budget, but at the very least, I have the $20 I’m saving from the subscription I cancelled. I believe that will allow me to make quite a few requests in the playground without exceeding my budget. It might even be more requests than I usually make in a month, I don’t know, I’ll have to check. Anyway, thank you so much for the heads up. It’s helped me significantly. Cheers!