Fine-tuning with negative samples, possible?

There are papers on this topic. Does including mistakes in the dataset really improve accuracy? Is this true?
Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents

For positive samples, we append “Please
generate a solution that correctly
answers the question.” For negative samples,
we append “Please generate a solution that
incorrectly answers the question.”
Unless explicitly stated, we use this in experiments. We also experimented with other reformatting strategies.4

My proposal
We can first fine tune on negative dataset then move on to the normal fine-tuning step?

Turning Dust into Gold: Distilling Complex Reasoning Capabilities from LLMs by Leveraging Negative Data [2312.12832] Turning Dust into Gold: Distilling Complex Reasoning Capabilities from LLMs by Leveraging Negative Data

2 Likes

So there are two internal process inside model, generating correct answer and wrong answer.

Then how process for negative answer affects process for correct answer?

Learning by counterexample can be extremely powerful. For example, this classic math book.

So if a model can be trained on what is right, and can also identify common incorrect things as well, you can get a very sophisticated understanding.

I know there is RL, or DPO
but can we do these with the only fine-tuning feature of OpenAI?

Can we do these by just mentioning “You incorrectly answer,” or “You correctly answer…”

@curt.kennedy
What do you think about this paper

Hi, I am author of this paper. Basically, this is a trade off between learning good and bad things. The generated negative samples are measured negative because they have wrong answers. But they still contain some information (e.g. reasoning) that worth learning. So intuitively there are some cases that fine-tuning with negative samples perform well: (1) The model is small. (2) The positive samples are limited. This is because the model does not contain the ability intrinsically to do the task well or positive samples have limited information for fine-tuning. In other words, for larger/better models with more positive samples, the fine-tuning with negative samples may not perform well or even downgrade the performance.

1 Like

How does it know right from wrong. It doesn’t understand.
So surely the correct assistant reply in the JSONL should always be correct.
That means you would be best to use the USER to prompt
‘Is this correct ‘your info’’ = assistant = ‘yes, that is correct’
"Is this correct ‘your info incorrectly’ = assistant = 'no, that is incorrect, you should put this ‘your correct info’
IMPORTANT: making sure to never mention the incorrect info in any of the Assistant responses. Otherwise the finetuning will see it and enforce it as a pattern to learn.

However, wouldn’t it be amazing if we could have a cull feature with fine tuning?Something that rips out words or phrases completely from the LLM, never to be seen again (i.e. the word ‘moreover’ or the phrase ‘I hope this letter finds you well’).

I didn’t say it knows right from wrong. I said its a trade off. The model learns both right and wrong. If the right helps more the performance improves. If the wrong downgrades more, the performance downgrades.

I wasn’t aiming my comment at yourself directly.
I am of the opinion, that the model should never be given wrong in the ‘assistant’ reply. Becuase it doesn’t know right from wrong, it will use wrong in some instances and this is not the desired outcome.

The thing is though that the model wouldn’t actually be “learning” the wrong solutions, it would be learning the right solutions for a different question.

If you’re fine tuning a chatbot with two separate types of interactions (in this case “mean” and “nice”) as long as the system/user prompts are distinctly different it shouldn’t make the model magically get confused which is which. Examples are examples, and these models work by predicting the next token. If the previous tokens tell it to do something wrong it should learn to associate whatever that output is with wrong-ness.

I’ve seen this same methodology work great in situations where I’ve had to use in-context learning to get the desired outputs, even when I have a limited number of examples.

Thank you for your comment, I did not expect I can talk with the auther!

In my view, this method could create two processes within the LLM: one that generates correct answers and another that generates incorrect ones. However, it’s challenging to intuitively grasp how these two processes influence each other. Could the improvement in accuracy simply be due to the increase in dataset size?

Without using methods like reinforcement learning—meaning without giving negative feedback at the loss function level—this might be difficult to achieve.

If my brain were an LLM and I was trained to respond incorrectly, I think I might somehow understand how an LLM feels…

If, as you suggest, the model is split into two parts, do you think there is any value in training with negative data?

The model isn’t split, it’s all just an unfathomably huge matrix of numbers.

My intuition tells me that wording like “do x wrong” activates a very similar section of that matrix of numbers as “do x right” does.

I believe it can identify the tokens that say to do it “right” or “wrong”, as well as the differences in the outputs between them, and as the model trains it can spot the difference between what might be considered a “right” and “wrong” output.

Sure it’s probably lower quality than significantly more positive data, but sometimes we don’t have that luxury.

In my non-professional but first-hand opinion I’ve seen these model be continually capable of generalising what the desired outcome is from a few positive and negative examples, specifically when it comes to in-context learning.

I might be totally and completely wrong though!

Oh, sorry about that… It’s my fault.

Yes it’s hard to understand how the two types of data can influence each other. We also did some experiments that uses same number of positive and negative samples, but without telling to generate correctly or incorrectly (i.e. direct SFT). And they underperform compared to telling the model to generate differently.

I think,
If this approach actually works, it gives us a big breakthrough.
because

  1. we can generate incorrect answers easily, manually or automatically
  2. this means we do not need RLHF, or DPO, complex approaches

Does anyone know how to give negative feedback without complex RL approach for LLM?
besides this approach?

There is a weight parameter, see the multi-turn-chat-examples. A negative weight parameter could be used for negative samples, like

"weight": -1

and "weight": 0.5 could be used for a good, but not ideal responds.

IMHO it is not possible now, but would be a good enhancement of the API :grinning:

2 Likes

Good to know! THanks
But I think negative weight is impossible in this feature because this is like adjusting learning rate.
Negative LR has any meaning? → maybe?

If the weight is negative, does this mean loss is increased? I do not think so

LR is 0 ~ 1
Loss cannot be changed by LR
WHat loss change is more low level methods, like DPO, PPO, etc

The question in this thread is “Can we fine-tune with negative samples only with simple SFT, end-to-end manner?” by not customizing loss function

It is easy to say that if we can change loss function.