Fine-Tuning In a Nutshell with a Single Line JSONL File and n_epochs

ruby_coder · February 14, 2023, 12:11pm

Those who frequent this community easily notice that developers struggle to get good fine-tunings results. In this multi-part tutorial, I’m going to use the davinci base model and fine-tune it multiple times with the same single-line prompt using different values for n_epochs.

I will take a completion with each of these models, and demonstrate that if you set your n_epochs value high enough, you can get excellent results fine-tuning with a single-line prompt!

Before we get started, I would like to acknowledge OpenAI who granted me a few extra credits so I could continue to help developers here by running embeddings, fine-tunings, completions and other API calls, so I can continue testing developer problems and posting here. Running all these tests cost money (not that much, but it adds up), and thanks to OpenAI, I am free to be more creative in helping other developers in my lab setup.

So, let’s get started!

First of all, let me introduce you to the simple single-line JSONL fine-tuning line we will be working with in this tutorial:

{"prompt":"What is your favorite color? ++++", "completion":" My super favorite color is blue. ####"}

Notice that the JSONL line above meets all the OpenAI criteria for a properly formatted JSONL key:value line item, namely:

The prompt ends with a separator, in this tutorial I will use ++++.
The completion begins with a single white space.
The completion ends with a stop, in this tutorial I will use ####.

Note, that I have coded a validator, but since this single-line JSONL file is so small, I’m not going to do anything but show you. how it looks in my “lab” setup:

Validation Function

Validation Results:

Note:

I strongly encourage all developers who are fine-tuning to validate the JSONL data for both JSONL compliance and also compliance with the OpenAI API “Preparing you Dataset” guidelines.. You can do this with a REGEX or another method that fits your coding style and experience. I use a REGEX.

New Fine-Tuning Params

In the following screenshot, this is my current “new fine-tuning” method and you can see that I have many preset n_epoch values I will test and share with you, including the values 4,8,16 and 32. You will see the completions results (the good and bad) for each of these n_epochs value:

Set Up Summary

I fine-tuned the base davinci model for many different n_epochs values, and, for those who want to know the bottom line and not read the entire tutorial and examples, the “bottom line” is that if you set your n_epochs value high enough (and your JSONL data is properly formatted), you can get great results fine-tuning even with a single-line JSONL file!

In the next screen grab, I show how I list my fine-tuned models:

List Fine-Tuned Models Function

I plan to use the following fine-tuned models for completions to demonstrate how to use n_epochs to get great results:

Fine Tuned Models and the `n_epochs` Value

davinci:ft-personal-2023-02-14-06-55-17 (4 n_epochs, the default)
davinci:ft-personal-2023-02-14-06-28-14 (8 n_epochs)
davinci:ft-personal-2023-02-14-09-01-20 (16 n_epochs)
davinci:ft-personal-2023-02-14-07-05-48 (32 n_epochs)

So, now for the results… !

I will reply to this post with the results and you can see how accurate or not accurate each completion is based on the n_epochs number.

Stay tuned.

sps · February 14, 2023, 12:20pm

Very well done tutorial @ruby_coder

Although it’s definitely a good idea to have more prompt completion pairs to cover edge cases, depending on the scenario.

ruby_coder · February 14, 2023, 12:21pm

Yes, but that is not the purpose of this tutorial.

I’m going to demonstrate the basics first with a single-line JSON file and how n_epochs values effect the results only and in a very controlled setup.

Let’s consider working together on a follow-up which goes down that path? Can we discuss after this one concludes @sps ?

ruby_coder · February 14, 2023, 12:29pm

THE UGLY

First, let’s look at the “ugly”, which is using an n_epochs value of 4, the current default.

Completion Setup

Completion

Comments

I cannot help but to LOL and ROTFL over this. This is what many people experience when they attempt to fine-tune. They post here in “hair pulling out” anguish, because their fine-tunings are so bad.

Let me assure all friendly developers, increasing n_epochs will change this as we get to a kind of “tipping point” value for this single-line JSONL file later on is this demonstration / tutorial.

ruby_coder · February 14, 2023, 12:34pm

Just decided to run the fine tuning again for n_epochs 8, 16 and 32 because I noticed I used different JSONL data when experimenting earlier, so for consistently, I just re-ran the fine-tunings and they are pending:

Pending Fine-Tunings for 8,16 and 32 `n_epochs`

Should take around a hour or so for all these fine-tunings to fully process and load, so I’m going to have dinner and will come back with the results after the models are cooked.

Don’t hold your breathe for lighting fast fine-tunings when you increase n_epochs! (Hahaha)

Appendix: Example Screen Grab Fine-Tuning with 32 `n_epochs`:

ruby_coder · February 14, 2023, 1:08pm

Update

Thirty minutes have passed and things are still processing smoothy at lightening fast speeds (joking).

I’m OK with the slowness because this is a “research beta” and I’m doing “research” and so my personal “completion temperature” is a cool, 60 degrees F

My initial guess was it would take an hour to bake these cakes, so let’s see where we are in 30 minutes, shall we?

ruby_coder · February 14, 2023, 1:37pm

Looks like, as expected the 8 n_epochs model is ready to Rock n’ Ruby

Let’s try it…

… and as expected … “The Bad”

Using 8 n_epochs gets us closer to what we are shooting for, but this was expected (since I have done this before today) and knew that 8 was not going to cut the mustard.

Any Gamblers?

Anyone care to wager on 16 n_epochs ? I’m confident 16 will get us very close, if not “right on the mark” .

We still need to wait for the 16 and 32 cakes to bake, so there is time to place your bets!

ruby_coder · February 14, 2023, 2:46pm

YAY! “The Good”

Over two hours and the 16 n_epochs cake has finally baked.

… and as expected, it’s a winner!

Completion Setup with 16 `n_epochs` fine-tuned model:

… and the winner is… a perfect completion.

Stand by for the 32 n_epochs cake to bake, even thought we all know the result will be great.

My guess it will take at least another hour for the 32 n_epochs shoe to drop, maybe two?

Time will tell…

ruby_coder · February 14, 2023, 4:39pm

“The Overkill”

Finally, the 32 n_epochs cake has baked and as expected, the results are solid.

32 `n_epochs` Results:

No surprises here.

All my tests has shown that 16 n_epochs work well and give the desired results.

So, what have we learned?

Well, we can say “with some authority” that fine tunings work, even single-line JSONL files as long as they are properly formatted and the n_epochs value is high enough.

So, when we read others posting that it is not possible to get good results fine tuning or fine tuning with only a few lines of training data, that is not accurate. It is possible, but as demonstrated the n_epochs value must be high.

This tutorial / demonstration did not address embeddings, but of course if I search the DB with using embedding vectors, I will get great results as well

We can change the prompt as well, for example:

Do you have favorite color? or What color do you like?

… and we get good results for the fine-tuned model

as well as the vector search:

Closing Comments

We could re-fine-tune the 8 n_epochs an additional 8 n_epochs and see how that model compares to the 16 n_epochs model. We could run all kinds of tests, but I’m going to leave it alone and not add more test cases unless we see at least 100 likes in this topic

Finally, if anyone had any doubt about fine-tuning or how to apply n_epochs to get better completion results, then this tutorial / demonstration should have erased all doubts. You can fine-tune a model with a single key-value pair and get good results if you correctly format your training data and crank up the n_epochs value high enough, there is no doubt.

HTH

sps · February 14, 2023, 4:45pm

I would love to see the findings on this experiment. Can’t wait to have my mind blown.

georgei · February 14, 2023, 5:20pm

Nice tutorial.
Can you test the 32 epochs with a different prompt?
Something like this:
“Tell me what is your favorite color by naming an object with that color”.

multitechvisions · February 14, 2023, 6:03pm

Thanks for the 1s-and-2s of how to go through this process! (^_^)

I appreciate all the insight and wisdom being shared, it’s greatly helpful for everyone that’s looking to learn.

I’m curious though: why would someone go through this when you might be able to accomplish very similar results through prompt engineering?

Curie

Babbage

Ada

If you’ve been following the topic of “fine-tuning” in the wild, you’ll have started to see posts, articles and papers from people who have been “in the know” for a few years - and a lot of them are saying something similar to:
- Prompt engineering can do so much more than originally thought, and fine-tuning is more hassle than it’s worth
- Things like few-shot learning combined with prompt engineering can take the place of fine-tuning.

I’m curious what you feel are the pros-cons of the fine-tuning process, compared to less resource intensive efforts?

curt.kennedy · February 14, 2023, 6:18pm

I really liked seeing this experiment play out. One thing I have wondered is that how the epochs behave across all four models (ada, babbage, curie, davinci). Not that I need you to recreate this across the other three models, but my theory is that require more epochs as you get lower in model parameters. So ada would need more than babbage, which needs more than curie, etc.

Good work!

PaulBellow · February 14, 2023, 6:20pm

In my experience with GPT-2, the higher you go with epochs, the more you have to really start worrying about over-fitting… ie output from the model is verbatim from the training data… just something to think about and check after the fine-tuning…

Great tutorial, though. Well laid out…

raymonddavey · February 14, 2023, 6:21pm

Just to be devils advocate, I wonder if it is the opposite.

With the other models not being so clever, maybe they will be easier to overwhelm the existing LLM with less epoch passes of the data.

curt.kennedy · February 14, 2023, 6:28pm

Good point on overfitting @PaulBellow

I have created my own classifiers that were overfitted which led to poor classifications down the line. But what does an overfitted GPT do? Does it just repeat the same thing over and over? If so, that could be funny in certain situations.

Also @raymonddavey good counter observation too. The less number of coefficients could easily get “burned in” with the higher number of epics.

PaulBellow · February 14, 2023, 6:34pm

It outputs strings of text verbatim from the training data rather than coming up with something new is my understanding.

ruby_coder · February 14, 2023, 6:37pm

For many applications this is a positive and not a negative: so it really depends on the domain.

PaulBellow · February 14, 2023, 6:38pm

Oh, for sure. I just brought it up as something to think about depending on your use case for the fine-tuning. Trying to be helpful and earn my stay around here!

I should try to write up my own tutorial on something soon…

ruby_coder · February 15, 2023, 4:31am

Having a lot of code to write, projects and tasks, I plan to run more tests on this tutorial / experimental / demonstration topic, based on all the good feedback, when the “total topic like count” reaches 100, and it’s currently at 46.

Topic		Replies	Views
Train (fine-tune) a model with text from books or articles API	62	28130	November 30, 2023
Fine tuning success stories - new 2023 models, what are your results? API fine-tuning	22	4908	December 15, 2023
Fine-tuning myths / OpenAI documentation API	24	14610	December 23, 2023
Share: Fine-Tune GPT 3.5 16k Results Only 10 Examples Novel Outlines API fine-tuning , api , tp-1 , authors	24	4012	February 4, 2024
It looks like GPT-4-32k is rolling out API gpt-4	202	71443	July 16, 2023