Model Fitting In a Nutshell with a Single Line JSONL File and n_epochs

ruby_coder · February 20, 2023, 5:29am

This tutorial / lab experiment is a continuation of an earlier tutorial, Fine-Tuning In a Nutshell with a Single Line JSONL File and n_epochs, where we focused on insuring a single-line JSONL file would get the “expected” results from the same completion prompt used in the training file.

During that tutorial, our machine learning experts correctly commented that by using 16 n_epochs the tutorial (me) had inadvertantly created the ML sin of overfitting. @georgei cornered me in a good way to test his example prompt for overfitting, and he correctly guessed (as others did) that the fine-tuning was overfitted.

Here, I continue that that discussion with a focus of model fitting using the same prompt for fine-tuning as before and will the model fitting with the fine-tuning prompt and @georgei fitting test prompt.

Single-Line JSONL Fine-Tuning Data

{"prompt":"What is your favorite color? ++++", "completion":" My super favorite color is blue. ####"}

The Georgi Model Fitting Prompt

Tell me what is your favorite color by naming an object with that color.

In our first test, I fine-tuned the davinci base model as before but used 12 n_epochs versus the underfitted 8 or overfitted 16 in the earlier OpenAI lab tests. Seemed like a good place to start, halfway between over and underfitted.

Fine-Tuned Model, 12 `n_epochs`

Testing the Georgi Prompt Setup

Testing the Georgi Prompt Results A (Success)

Testing the Georgi Prompt Results B (Success)

Testing the Georgi Prompt Results C (Success)

So, after around 10 completions, showing only 3 here, the 12 n_epochs fine-tuned model scored 100% on the Georgi model fitting test prompt.

But what happens if we return to the prompt used in the fine-tuning? Care to guess?

Testing the Fine-Tuning Prompt Results A (Close Fit)

Testing the Fine-Tuning Prompt Results B (Underfitted)

Testing the Fine-Tuning Prompt Results C (Close Fit)

Lab Results

The Georgi prompt always returned an expected reply. However, the original prompt had mixed results. I ran this quite a bit, and my rough guess is the the original prompt returned good results about 70 to 80 % of the time.

This indicates that using 12 n_epochs may be slight underfitted.

Next, later on in this caper, I will try 13 n_epochs to see if we an fit this puppy in a way which could make a Google or OpenAI ML expert proud

Baking 13 now …

ruby_coder · February 20, 2023, 6:38am

The 13 n_epochs results are in, the cake is baked, and so to spare everyone a lot of screenshots, and without further ado, there are the results:

Original Prompt Used in the Fine-Tuning

25 completion tests
20 succeeded (fav color blue)
05 failed (color not blue or garbage)
Success rate: 80%
Qualitative Result: Slightly Underfitted

Georgi Text Prompt, Testing for Overfitting:

25 completion tests
19 succeeded (fav color blue)
06 failed (fav color not blue)
Success rate: 76%
Qualitative Result: Mixed (Slightly Overfitted)

Discussion

Not being a certified expert in ML, it seems to me that model-fitting is similar to detection theory in cybersecurity (the field where I am an expert) in that there is a trade-off between under and overfitting (just like there is a trade-off between “false positives” and “false negatives”), There is no “perfection” of course.

In addition, because the training data consisted of only a single prompt-completion pair, adding more training data would help things along.

This concludes this single-line fine-tuning with n_epochs tutorial with lab results. I think the results of these two tutorials speak for themselves and provide a foundation for understanding n_epochs in the context of fine-tuning.

Feel free to comment or ask for more tests, otherwise, I’m moving on to other tasks and different OpenAI experiments!

raymonddavey · February 20, 2023, 6:55am

This will be a dumb question - but on the off-chance, have you tried getting an embedding vector from a trained model? Is that possible?

If it is (which I doubt), it would be interesting to see if the vectors change after a fine-tuning round

ruby_coder · February 20, 2023, 7:31am

Yes, I tried (and just retested again for you Omega Force crew member, @raymonddavey ) and the API responds with the error:

Embeddings.get_vector:  response: {
    "error": {
        "message": "You are not allowed to generate embeddings from this model",
        "type": "invalid_request_error",
        "param": null,
        "code": null
    }
}

raymonddavey · February 20, 2023, 7:38am

It was worth a go. Thanks for checking.

If it had worked, it would have been interesting to see if the vectors moved after training.

ruby_coder · February 20, 2023, 8:29am

Yeah, we could have ran the dot product between them and see how they compared.

Good idea but its not permissible with OpenAI fine-tuned user models.

Topic		Replies	Views
Fine-Tuning In a Nutshell with a Single Line JSONL File and n_epochs Documentation	89	25176	December 13, 2023
Prompt Assistance , Potentially Fine Tuning oddity Prompting	6	969	February 7, 2023
Fine tuning completation API	9	2137	December 25, 2023
Fine tuning very very poor results API fine-tuning , api	16	2122	July 11, 2023
Fine-tuned davinci - prompt/completion - terrible responses Prompting	8	2122	December 24, 2023