Unable to get "weight" field to work

I’ve been trying to add the “weight” key to my dataset but I either don’t understand how it works or am implementing it wrong.

I’m following the documentation here: https://platform.openai.com/docs/guides/fine-tuning/multi-turn-chat-examples

My thought process is that I want to add some mistakes into the data where the assistant says something bad but will set the weight to 0 to not train on the specific message. The rest will stay the same.

After doing a fine-tune, it seems like I am instead actually training it to make the mistakes and that the weight parameter is not working as I intended, as if it was just set to “weight”: 1.

The way I tested it was to include a sample where the model outputted special tokens (ie. something that would not usually be outputted by the model, in this case: <API></API>), but I set it to a weight of 0. I thought this would mean that the fine-tuned model would not try to output those special tokens <API></API>, but it does, even though the weight was set to 0.

Should this be the case?

Here is a sample for reference:

{"messages": [{"role": "system", "content": "You are a AI customer service agent."}, {"role": "user", "content": "hello"}, {"role": "assistant", "content": "Hey there, I'm Hank. How's your day going?", "weight": 1}, {"role": "user", "content": "what is the date today"}, {"role": "assistant", "content": "<API></API>", "weight": 0}, ... the rest of the conversation ... ]}

2 Likes

Bump! same issue! I trained with some samples at weight 0 but it just seems to still train on those tokens.

2 Likes

Or if anyone can point me to some more documentation of how the “weight” key should be used. I could only find that one snippet. Thanks.

If weight is zero does it not just skip that assistant response in training, I think but don’t quote me that if you have a training sample:

SYSTEM, USER, ASSISTANT, USER, ASSISTANT

This is considered two training examples that break down to

  1. SYSTEM, USER, ASSISTANT
  2. SYSTEM, USER, ASSISTANT, USER, ASSISTANT

By saying:-
SYSTEM, USER, ASSISTANT(weight=0), USER, ASSISTANT

You are skipping example number 1 but only using example number 2.

Please correct me someone however this is how I understand it.

EDIT: I asked Chat GPT about this and it seems I am wrong so, would be nice to get clarification on this also! https://chatgpt.com/share/7fe6b71e-bf1b-4c29-bcea-42e94df54565

1 Like

When I was testing, I had a least one “weight”: 0 somewhere in every one of my samples but it still seemed to learn to output those <API></API> tokens so it seems like it is being trained on those full samples to some extent even if a zero weight exists.

Yeh true I think the whole conversation is still taken as context