Finetuning experiments (How not to finetune)

I attempted to fine-tune the Curie model for classification purposes. My objective was to provide the fine-tuned model with a user prompt that could accurately identify their intentions and the expected media types (audio, video, text, image) and provide a concise query. While this may be closer to generation rather than classification, my main goal was to classify prompts.

Initially, I fine-tuned the model using over 5,000 samples. However, I made the mistake of not balancing the samples for each category, which caused the model to become biased towards the majority of samples. One category only had 7 samples, so it was hardly represented in the classification results. To address this issue, I prepared around 100 additional samples for that category. Although this category is not particularly unique and is rare in real-life conditions, I didn’t collect enough samples to balance it with more common categories. This caused the model to become even more biased towards the less common category.

I believe this was due to the fact that I only fine-tuned the model with one category sample during the second round. I realize that this is not a good practice, and if we had trained the model for only one output (rather than three), the issue may have been less evident.

I would greatly appreciate expert advice on my thoughts and recommendations for my case. Thank you in advance.

1 Like