My training dataset contains two labels, ‘Damage’ and ‘No label’. Is it better to keep the labels as they are, or should I encode them as 1 for ‘Damage’ and 0 for ‘No label’ when fine-tuning GPT-4.0-mini?
This is an example of datapoints in my training dataset:
{“messages”: [{“role”: “user”, “content”: “;none;reverse lights stay on all the time-please check~and advise;none;replaced faulty relay;body elect concern;relay, relay”}, {“role”: “assistant”, “content”: “No label”}]}
{“messages”: [{“role”: “user”, “content”: “;none;c/s someone sideswiped left rear of vehicle, left~outer tailight lens cracked, requests we order~and replace left outer tailight assembly;none;replaced left outer taillight lens per customer~request;taillights;combination lamp assy-rear,lh”}, {“role”: “assistant”, “content”: “Damage”}]}
Additionally, I have not included system messages in the training dataset. Would including system messages be necessary or beneficial for this fine-tuning task?