Label encoding and adding system messages in training dataset for fine-tuning GPT-4.0-mini

lekhamadav · January 28, 2025, 9:25am

My training dataset contains two labels, ‘Damage’ and ‘No label’. Is it better to keep the labels as they are, or should I encode them as 1 for ‘Damage’ and 0 for ‘No label’ when fine-tuning GPT-4.0-mini?

This is an example of datapoints in my training dataset:

{“messages”: [{“role”: “user”, “content”: “;none;reverse lights stay on all the time-please check~and advise;none;replaced faulty relay;body elect concern;relay, relay”}, {“role”: “assistant”, “content”: “No label”}]}
{“messages”: [{“role”: “user”, “content”: “;none;c/s someone sideswiped left rear of vehicle, left~outer tailight lens cracked, requests we order~and replace left outer tailight assembly;none;replaced left outer taillight lens per customer~request;taillights;combination lamp assy-rear,lh”}, {“role”: “assistant”, “content”: “Damage”}]}

Additionally, I have not included system messages in the training dataset. Would including system messages be necessary or beneficial for this fine-tuning task?

Topic		Replies	Views
Fine-tuning dataset : system, user and assistant content : where to put the real instructions? API fine-tuning	1	1002	December 29, 2023
Finetuning GPT40-mini to build a very skillful, professional and compliant AI debt collection agent using good "Agent-consumer" conversation data Community fine-tuning	1	176	August 5, 2024
How to assign weights during chat based fine tuning? API fine-tuning-problems , gpt-4o-mini	0	87	December 18, 2024
Is it possible finetune with unlabeled data and then labeled data? API fine-tuning	5	923	March 18, 2024
Fine tunning data with fixed system content API	1	376	October 12, 2023

Label encoding and adding system messages in training dataset for fine-tuning GPT-4.0-mini

Related topics