The 13 n_epochs results are in, the cake is baked, and so to spare everyone a lot of screenshots, and without further ado, there are the results:
Original Prompt Used in the Fine-Tuning
- 25 completion tests
- 20 succeeded (fav color blue)
- 05 failed (color not blue or garbage)
- Success rate: 80%
- Qualitative Result: Slightly Underfitted
Georgi Text Prompt, Testing for Overfitting:
- 25 completion tests
- 19 succeeded (fav color blue)
- 06 failed (fav color not blue)
- Success rate: 76%
- Qualitative Result: Mixed (Slightly Overfitted)
Discussion
Not being a certified expert in ML, it seems to me that model-fitting is similar to detection theory in cybersecurity (the field where I am an expert) in that there is a trade-off between under and overfitting (just like there is a trade-off between “false positives” and “false negatives”), There is no “perfection” of course.
In addition, because the training data consisted of only a single prompt-completion pair, adding more training data would help things along.
This concludes this single-line fine-tuning with n_epochs tutorial with lab results. I think the results of these two tutorials speak for themselves and provide a foundation for understanding n_epochs in the context of fine-tuning.
Feel free to comment or ask for more tests, otherwise, I’m moving on to other tasks and different OpenAI experiments!
![]()