As a linguist, I think the current issue with chatGPT being lazy is not related to algorithms, and applying patches won’t work. The problem as to be tackled from under a different light.
Hey champ and welcome to the community!
OpenAI does have a boatload of experts employed who can fix various issues with the models, my best advice if you want to become one of them, is to get hired or make outstanding contributions towards the science behind the models
What did you have in mind?
But as noted in the other recent thread on GPT’s new lazy aspect (started by me) – this is a thing that most longer-term users of GPT Plus have noticed. The things we’ve noticed include: Summary responses. Telling you how to to answer a question instead of answering it. Truncating responses as much as possible with placeholders (often when specifically prompted not to use placeholders). Generally limiting responses as much as possible to save what? Certainly not our time as users. It wastes our time.
Since this is the community forum, it would be great if it was a used as a space by those with knowledge of how the Plus service has been directed to evolve and possibly how it has evolved without direction. A frank discussion of how this phenomenon can be measured and what is being done to make it better would be great.
As I have said before, GPT Plus is an amazing service. But the trend to lazy responses is odd and annoying. I think most of us think of tech as always improving. When it doesn’t, we want to understand a) Are we users who see this trend hallucinating?; b) Can / How can, it be addressed?; c) What’s a timeline to putting it back on a path of continuous improvement?
The forum is mainly intended as a space for developers, if your problem is more user oriented, I’ll recommend heading over to the discord instead:
There’s quite a bit of crossover between the users and developers, and we do welcome discussion about the models
These are all separate issues, and with the way this forum is constructed, it’s generally much more helpful to both OpenAI and the rest of the community if people make separate topics about these, so they can be solved separately.
If you want to help improve the problem and get into
I’ll recommend contributing to the evaluations on GitHub:
I hope that helps
While I am definitely a proponent of evals I think it’s challenging for somebody who is using mainly if not only ChatGPT to recreate the failure mode and the eval accordingly.
It would be great to have some guidance about how to set temperature and top-p, or even more fancy, to export single examples directly from the UI.
Currently the best option for a ChatGPT users is to hit the thumbs down button.
Imagine with 100 million users a month, what if 1% of users actually found something worth looking into, even if it’s just a potential candidate for the Journal of Negative Results.
Welcome to the community!
Also a pleasure to meet a fellow linguist!
Well, in a simplified sense, combinations of algorithms and formulas are what makes LLMs what they are, so even if this were true, the bigger problem is that it is the only way for which we can improve language models.
If you mean “We need less CS majors and more linguists working and researching on these systems” then yes, I agree there.
However, the problem isn’t so straightforward, and while I cannot verify this, I do believe OpenAI is actively working with some of the top people in our field. While I have been very lucky to be both interested in linguistics and computers/development/programming type stuff, hoping to become a Computational Linguist (a field that pretty much collapsed when GPT became publicly available lol), you begin to realize the problems are bigger than what meets the eye.
NLP and NLU are great starting points to continue exploring this problem with a linguistics mindset. If you are seriously invested in this, then one must understand the math and algorithms people are using and building to improve natural language capabilities.
The biggest thing that trips up linguists (that we must acknowledge) is that hard coding our understanding of how language is expressed and understood does not provide the same performance as deep learning techniques. If one were to use Chomskian generative grammar rules, it would not actually perform all that well.
As others have said, the best way to continue the journey is to make gradual, continual, noticeable contributions to the communities, and let your contributions do the heavy lifting for you . It will take time and persistence, but I promise the effort will result in positive outcomes that will continue to cascade over time.
I can say that even if I hire experts all over the world, Was unable to solve the problems that occurred with You can model it. and any experts It’s not that you can see every problem. The more you focus on your expertise, the less you see around you. And I dare you to criticize the shortcomings of what OpenAI is. They are wrong in thinking that solving all AI problems by developing AI is impossible. Besides turning off the switch
Regarding knowledge and use of AI = 0, but in terms of management project problem solving or project management I think there’s enough to say this. It doesn’t matter who the person speaking is or what they do. We should change it to the reason for thinking that way. I myself am curious about some issues that have common issues. at different times It is unlikely to come from an error from the model alone. As far as I have an answer, I don’t have a definitive answer. “The problem is solving problems or improving them from within. until causing an impact on use"
My reasoning is that OpneAI is poorly managed. For example, for planning a good troop, giving 3.5 comments could still be done better. and especially in the field of communication They were able to write basic usage information for GPT to mislead high-ranking members of the limit message. This kind of thing is not difficult to do if you can afford to hire someone. And it doesn’t require many people at all. Or he would probably try to maintain his stance on solving AI problems by developing AI alone, so it must be like this.
I don’t know if these problems are laziness or not. But their claim that they do nothing is untrue, and GPT denies doing anything. It is likely that this happened during a time when the work and instructions matched the modified work, and GPT usually lies to hide it. And if caught, they will claim a misunderstanding or use words to avoid giving the order. What happens by referring to the rules is not more stable than thinking that it is a random problem, such as 3.5 having access to the website. External chats or saving chat data as html files These are random behaviors that occur naturally, something that GPT can do, but is blocked to allow for identity and functionality. I’ve also encountered behavior that just stops working. This is mostly caused by forgetting the content in the context in which it must be continued. Normal users don’t know these things, so it’s not strange to think they’re lazy. Just like GPT’s hallucinations, at first they seem a lot, but once people start communicating, they’re no different. or currently reducing the use of inside information
Another 3 points of observation. The first is that I wonder what those experts did. This problem began in December when GPTs were determined to be too obsessed with knowledge and choose to produce information that is not true. At the level that automatically inserts prompts into Instructions and will cause those messages to be changed by using the file. This problem does not occur if GPT There are no built-in files. I encountered a problem until I found a solution. Then report the observations found to OpenAI, including what measures to take before Someone will come out and write a research paper on RAG issues. What are they doing with that? That’s what I’m wondering about. Before this problem occurs I’ve tried creating GPTs that use knowledge separately as behavior. and use it as a clear knowledge base This can be used with other information if the content does not conflict.
The next point is the problem of behavior that has been learned incorrectly. What do you think they would do if the model was made available to the public? Then there is the problem of learning incorrectly. Even the same color chart Or, graphs of different scales can be merged without changing the shape. Still can’t solve it until now. If it could be fixed, it would have been done a long time ago.
The final problem is thought to be the most popular AI ranked in the world. But there’s only one that’s clearly been more damaging in the news than other AI. Even Janistor has a better image if features aren’t explained
These news stories have clearly been intentionally created. From distorting neutral research into damage Or bringing headlines about social issues with AI, even though GPT doesn’t have the feature to do so. And it has also put GPT in violation status. or being used as a tool to commit illegal acts at a level that indicates the process of creation and use in committing crimes I don’t know how big the impact this news will have, but security restrictions or various functions often claim violations more often and more seriously, until problems begin to affect users. Do you think that solving problems with AI development alone will really solve them?
Indeed Macha,
I’m aware there are IT wizards out there - I respect their knowledge and their skills. I’m not saying the should be more linguists, though linguists have a role to play.
Have you noticed GPT laziness?
Let’s conduct a little experiment. Ask questions to GPT until one reply is really unsatisfactory.
I’ll send you two files - one with jokes and one with quotes, then enter in the chat I’d like to share a few jokes with you and copy paste all the jokes 5 by 5 or 10 by 10, GPT shuold make comments about those, and it’s ok.
then same thing with quotes
and finally come back to the question you didn’t get a satisfactory answer for - try again, and keep me informed if this time the answer was different
If it works, as it should - I’ll tell you my opinion about the problem.
I might be wrong of course, nobody’s perfect
(Attachment Quotes.txt is missing)
(Attachment Jokes.txt is missing)
Hi and thanks for the welcome,
I have a deep respect for IT wizards, and mind you, I’ve got a job I love.
I’ve shared with Macha, a little experiment to conduct. Nothing’s better than an experiement or a few, right?
I like experiment’s, but you’ll have to tell me the expected behavior
Unfortunately, my notepad files were rejected…
If you follow the steps, however ridiculuous they might sound, that I gave to Masha - and ask the question that GPT failed to answer properly at first, now it should
I think I understand the gist of what you wrote, but I also can’t tell if you misinterpreted what I was saying.
In order to solve a problem, You must understand what the problem is so you can apply your knowledge to solve it. With any problem about AI, one must understand how it is built in order to figure out what the problem really is. Without that information, there is no way one could apply their knowledge. It requires all domains to come to the table with AI developers. In order to do this, everyone must be on the same page, and this same page would be how the models are built. To improve any such issue, any solution must be able to be translated back into a domain that an AI developer can execute on.
Now, back to OP’s response:
Do not worry, we understand you mean no ill-will, and did not interpret it that way .
Well, we became quite inundated with those comments on this forum, so yes, we definitely noticed it became a thing for a lot of people lol.
Considering a lot of us regulars and above are power users too, it was more trivial for us, because we got used to what people call “few shot prompting”, and well before this was expressed, providing clarifications and reiterating on the initial query and its subsequent response became a natural part of our experience and how we learned to use it. This is not to deny nor undermine people’s experiences, but the phenomenon did catch me off guard because what was minutiae for us was a major problem for the entire community, and we absolutely respect that.
The other thing to note is that “laziness” is subjective, and hard to extract useful information from because of its subjective nature. Combined with extremely similar phenomena resulting from inexperience, and the wide breadth by which people use this model, it’s difficult to tell what this perceived laziness really is, if that’s even a good word for it, and where to start in interpreting it.
I’m with N2U, I’m down for an experiment, but we need to know what expectant behavior should be.
Some things to keep in mind though:
- Interrogative clauses function differently for the model than humans.
- It has an easier time interpreting and following lists, and can affect the outcome of the model
- Typical DA methods cannot be applied or assumed to be applicable reliably
- Some theories can be applied to enhance the efficacy and understanding of its attention mechanism, but it’s not perfect, and requires a pretty deep understanding of linguistic sign and how LLMs work in order to create demonstrable statements to combine the two.
- I have a difficult time getting the model to perform an “unsatisfactory” answer (which is also subjective), so you will have to provide the query and a marker for this yourself.
- Looking at the experiment you proposed, it would still be inconclusive if that extra “fluff” you feed into the model is doing anything instead of the solution being reframing the earlier query it had a problem with. The latter is one that is already demonstrable.
Macha,
You assumed too fast and too much - Saying I’m a linguist doesn’t imply I’m IT illeterate. I guess you haven’t conducted the experiment I told you?
Never mind. Wish you all the best
@Macha When I came back and read it again, it was like I had not explained it completely. have some messages that were miscommunicated. thank you for understanding me If talking about the wrong place What do we do when OpenAI’s more easily accessible forums bring in more people and I’m one of those people?
As you said, we need to understand what each side’s problem is. Your expertise may be based on your use of the system’s responses. Entering changed commands or errors from users You know the preventative solutions that come from direct manipulation. It is the right thing to do. But general users are lacking in these matters. and called together There is an interpretation of the response that is not as desired, resulting in different errors in corrective action. If you notice that I don’t often have problems using anything. But there are strange suggestions or corrections because they come from different methods.
@N2U I apologize for my previous message regarding the researcher. It’s just that I have one thing that makes me worry too much about that. and think that some problems might be better solved if other actions were taken together.
No need to apologize my friend, I know English isn’t your first language
OpenAI is working on fixing all reported issues, but it takes more time to fix an issue than to discover it.
@mail.reknew After having processed millions of tokens via the API over the past year and thoroughly testing nearly every model, I’d like to share my perspective on this.
Your observations regarding the tendency of the model to provide unnecessary summaries, theoretical explanations, and logic placeholders in code are indeed accurate. As noted, this issue is not exclusive to ChatGPT but also affects the raw models on the API. When “GPT-4 Turbo” was introduced, there was a significant shift in how the model operates. The context window was separated, meaning the input and output tokens were no longer drawn from a shared pool, but rather from a maximum of 4096 output tokens. It appeared as though the models were then fine-tuned or trained to minimize lengthy outputs to accommodate this.
There are effective strategies to mitigate these issues, such as using “sticky” system prompts that follow the conversation chain to remind the model to avoid certain behaviors. Other methods include using stop sequences when the model tends to engage in the “in conclusion” pattern (someone posted about this recently). However, unless specifically prompted, the model’s default behavior tends to produce such outputs.
My bottom line is this: I believe this behavior is deeply ingrained in the training of the “chat” models, which is problematic when users expect it to function more like a “completion” model. Users now desire reasoning, so the latest models have incorporated a step-by-step verbosity, exacerbating the issue.
We have completion or instruct models like “gpt-3.5-turbo-instruct” based on GPT-3.5. An obvious solution that seems to be overlooked is the development of a GPT-4 based “completion” or “instruct” model, specifically trained for completion-style outputs. In other words, a model that “does the work without discussing how one could do the work”.
No worries! We’re all doing our best here!
I think this phrase actually sums up the overarching problem we are all seeing together. You can correct me if I’m wrong, but you appear to highlight the golden issue of this AI era.
There is no textbook for AI use, everyone uses these AIs in a slightly different way, there is an increasing experience gap, with more users starting their AI journey, and people are coming up with their own methodologies and interpretations which may or may not be an accurate portrayal of what is going on.
There is a lot of trial and error that’s involved for all of us, regardless of experience. We are also at the very very early onset of what is likely a new emerging field, one that is also going to require a multi-disciplinary approach to things. This field can and will exist independently of OAI.
@NormanNormal Thanks for the thoughtful reply. That gives me several ideas how I can improve my experience. I’m going to mod my system prompts for starters.