Finetuning experiments (How not to finetune)

I attempted to fine-tune the Curie model for classification purposes. My objective was to provide the fine-tuned model with a user prompt that could accurately identify their intentions and the expected media types (audio, video, text, image) and provide a concise query. While this may be closer to generation rather than classification, my main goal was to classify prompts.

Initially, I fine-tuned the model using over 5,000 samples. However, I made the mistake of not balancing the samples for each category, which caused the model to become biased towards the majority of samples. One category only had 7 samples, so it was hardly represented in the classification results. To address this issue, I prepared around 100 additional samples for that category. Although this category is not particularly unique and is rare in real-life conditions, I didn’t collect enough samples to balance it with more common categories. This caused the model to become even more biased towards the less common category.

I believe this was due to the fact that I only fine-tuned the model with one category sample during the second round. I realize that this is not a good practice, and if we had trained the model for only one output (rather than three), the issue may have been less evident.

I would greatly appreciate expert advice on my thoughts and recommendations for my case. Thank you in advance.


After you added the 100 examples, do you mean that instead of being biased toward the majority, the model flipped sides and biased toward the unlikely class and was choosing it too often? Or that it didn’t help?

How many examples are in your validation set?

Since you’re working on an edge case that doesn’t come up often, it might be interesting to set a number higher than 1 for logprobs when doing your completion calls. Then you can see what the second or third most likely choices would have been, and if the edge case is showing up there at all in some of the examples that are supposed to return the edge case class. Then you could work on getting those probabilities higher and see if you are making progress more granularly.