There are papers on this topic. Does including mistakes in the dataset really improve accuracy? Is this true?
Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents
For positive samples, we append “Please
generate a solution that correctly
answers the question.” For negative samples,
we append “Please generate a solution that incorrectly answers the question.”
Unless explicitly stated, we use this in experiments. We also experimented with other reformatting strategies.4
My proposal
We can first fine tune on negative dataset then move on to the normal fine-tuning step?
Hi, I am author of this paper. Basically, this is a trade off between learning good and bad things. The generated negative samples are measured negative because they have wrong answers. But they still contain some information (e.g. reasoning) that worth learning. So intuitively there are some cases that fine-tuning with negative samples perform well: (1) The model is small. (2) The positive samples are limited. This is because the model does not contain the ability intrinsically to do the task well or positive samples have limited information for fine-tuning. In other words, for larger/better models with more positive samples, the fine-tuning with negative samples may not perform well or even downgrade the performance.
How does it know right from wrong. It doesn’t understand.
So surely the correct assistant reply in the JSONL should always be correct.
That means you would be best to use the USER to prompt
‘Is this correct ‘your info’’ = assistant = ‘yes, that is correct’
"Is this correct ‘your info incorrectly’ = assistant = 'no, that is incorrect, you should put this ‘your correct info’
IMPORTANT: making sure to never mention the incorrect info in any of the Assistant responses. Otherwise the finetuning will see it and enforce it as a pattern to learn.
However, wouldn’t it be amazing if we could have a cull feature with fine tuning?Something that rips out words or phrases completely from the LLM, never to be seen again (i.e. the word ‘moreover’ or the phrase ‘I hope this letter finds you well’).
I didn’t say it knows right from wrong. I said its a trade off. The model learns both right and wrong. If the right helps more the performance improves. If the wrong downgrades more, the performance downgrades.
I wasn’t aiming my comment at yourself directly.
I am of the opinion, that the model should never be given wrong in the ‘assistant’ reply. Becuase it doesn’t know right from wrong, it will use wrong in some instances and this is not the desired outcome.
The thing is though that the model wouldn’t actually be “learning” the wrong solutions, it would be learning the right solutions for a different question.
If you’re fine tuning a chatbot with two separate types of interactions (in this case “mean” and “nice”) as long as the system/user prompts are distinctly different it shouldn’t make the model magically get confused which is which. Examples are examples, and these models work by predicting the next token. If the previous tokens tell it to do something wrong it should learn to associate whatever that output is with wrong-ness.
I’ve seen this same methodology work great in situations where I’ve had to use in-context learning to get the desired outputs, even when I have a limited number of examples.
Thank you for your comment, I did not expect I can talk with the auther!
In my view, this method could create two processes within the LLM: one that generates correct answers and another that generates incorrect ones. However, it’s challenging to intuitively grasp how these two processes influence each other. Could the improvement in accuracy simply be due to the increase in dataset size?
Without using methods like reinforcement learning—meaning without giving negative feedback at the loss function level—this might be difficult to achieve.
If my brain were an LLM and I was trained to respond incorrectly, I think I might somehow understand how an LLM feels…
The model isn’t split, it’s all just an unfathomably huge matrix of numbers.
My intuition tells me that wording like “do x wrong” activates a very similar section of that matrix of numbers as “do x right” does.
I believe it can identify the tokens that say to do it “right” or “wrong”, as well as the differences in the outputs between them, and as the model trains it can spot the difference between what might be considered a “right” and “wrong” output.
Sure it’s probably lower quality than significantly more positive data, but sometimes we don’t have that luxury.
In my non-professional but first-hand opinion I’ve seen these model be continually capable of generalising what the desired outcome is from a few positive and negative examples, specifically when it comes to in-context learning.
Yes it’s hard to understand how the two types of data can influence each other. We also did some experiments that uses same number of positive and negative samples, but without telling to generate correctly or incorrectly (i.e. direct SFT). And they underperform compared to telling the model to generate differently.
Good to know! THanks
But I think negative weight is impossible in this feature because this is like adjusting learning rate.
Negative LR has any meaning? → maybe?
If the weight is negative, does this mean loss is increased? I do not think so