Finetuning experiments (How not to finetune)

aaliboyev · April 5, 2023, 8:15pm

I attempted to fine-tune the Curie model for classification purposes. My objective was to provide the fine-tuned model with a user prompt that could accurately identify their intentions and the expected media types (audio, video, text, image) and provide a concise query. While this may be closer to generation rather than classification, my main goal was to classify prompts.

Initially, I fine-tuned the model using over 5,000 samples. However, I made the mistake of not balancing the samples for each category, which caused the model to become biased towards the majority of samples. One category only had 7 samples, so it was hardly represented in the classification results. To address this issue, I prepared around 100 additional samples for that category. Although this category is not particularly unique and is rare in real-life conditions, I didn’t collect enough samples to balance it with more common categories. This caused the model to become even more biased towards the less common category.

I believe this was due to the fact that I only fine-tuned the model with one category sample during the second round. I realize that this is not a good practice, and if we had trained the model for only one output (rather than three), the issue may have been less evident.

I would greatly appreciate expert advice on my thoughts and recommendations for my case. Thank you in advance.

markhennings · July 16, 2023, 4:18pm

After you added the 100 examples, do you mean that instead of being biased toward the majority, the model flipped sides and biased toward the unlikely class and was choosing it too often? Or that it didn’t help?

How many examples are in your validation set?

Since you’re working on an edge case that doesn’t come up often, it might be interesting to set a number higher than 1 for logprobs when doing your completion calls. Then you can see what the second or third most likely choices would have been, and if the edge case is showing up there at all in some of the examples that are supposed to return the edge case class. Then you could work on getting those probabilities higher and see if you are making progress more granularly.

Topic		Replies	Views
Help with fine-tuning for text categorization API	4	1212	December 16, 2023
Fine tuning - how exactly does it work? API	6	2382	December 23, 2023
The babbage-002 fine tuned model generates invalid category Bugs api	3	3648	December 20, 2023
Struggling with fine-tuning GPT for generating JSON API fine-tuning , fine-tuning-problems	1	286	July 9, 2024
Using the new fine-tunes endpoint for binary classification API fine-tuning , python	10	2053	January 11, 2024

Finetuning experiments (How not to finetune)

Related topics