Fine-Tuning In a Nutshell with a Single Line JSONL File and n_epochs

Here is the 16 epoch on your request @georgei, since I’m in the model now and it’s loaded and rock’n and ruby’n

1 Like

@ruby_coder Agree that overfitting is a new topic.

Similar to what @georgei has mentioned, I use a fine-tune of babbage with 4000 training examples and N=4 epochs (its job is to classify). It seems to work. So lots of data with low epochs is OK.

But I just saw the N=16 “Tell me what is your favorite color by naming an object with that color.” and it just said “My favorite”. So if you want reasoning, it looks like this might be overfitted, agree? But for basic Q and A, maybe it is good to go?

But yeah, so many parameters to optimize, multiple use cases, etc, when worrying about overfitting.

I know I have had my hand slapped enough times by the AI Gods for overfitting. :roll_eyes:

Good stuff!


Well, I think it is based on use case and its not prudent to over generalize without a specific use case.

I would argue back that GPT models are not reasoning at all, regardless of how you “fit” them.

They do not even mimic reasoning, they are just cool auto-completing text prediction engines based on a large language model.

1 Like

I have approximately 3K rows of dense legal text and I’m using embeddings and completions with prompt engineering to answer quesions. My use case would be a great comparison for your fine-tuning approach with high epochs. I gave up on fine-tuning for similar reasons as others in the community, but I never experimented with epochs. Very helpful information @ruby_coder. Thank you.


I hear what you are saying, but here is an example of what I mean by “reasoning”:

1 Like

Yeah, but but to be precise, thats not “reasoning” nor “inference” … it is just auto-completing / predicting text based on statistics using probability blah, blah…

Typical GPT “stuff” …


… blah, blah … the same as I mentioned above.


One additional comment, in case it helps. OpenAI built-in fine-tuning tool allows you to upload a validation file, in the same format as the training file. Analyzing the evolution of its metrics through the training process might be very insightful in terms of understanding whether overfitting is indeed happening or not. It’s very unlikely that extremely overfitting a model such as this one is useful under any scenario, because it becomes just a very expensive lookup table hahaha. However, it’s true that “a little bit of overfitting” may be useful under very specific circumstances. In general, it’s not.

That being said, it seems pretty likely that this model is indeed overfitted (at least, for the given evidence). If the response to “ Tell me what is your favorite color by naming an object with that color” is “My favorite”, the generalization power of GPT-3 seems compromised. Still just an hypothesis though. As I said before, the best way to check this out is analyzing metrics beyond the training dataset.


Yes, I have found the discussion on overfitting interesting and helpful in beginning to point out the trade-off between overfitting and underfitting.

On the other hand, I joined this community around five weeks and according to the site stats,

Screenshot 2023-02-19 at 8.37.05 AM

… and to be honest, I do not recall a single member in that time frame posting about having fine-tuning issues which could be attributed to overfitting. Most of the user fine-tuning problems posted here have been to to the tune of “fine-tuning does not work for me, I just get garbage …” or simple “fine-tuning does not work, look at the junk I get”, etc. This is a symptom of underfitting, not overfitting.

So, I performed all of these experiments and to address underfitting, based on the first sentence in this topic:

Those who frequent this community easily notice that developers struggle to get good fine-tunings results. In this multi-part tutorial, I’m going to use the davinci base model and fine-tune it multiple times with the same single-line prompt using different values for n_epochs .

In addition, I’ve also become a bit disappointed here because the signal-to-noise ratio in this community is so low (from a hard-core developer perspective); and so I am glad we have enjoyed so much “signal” and shared a lot of meaningful insight in this topic.

Personally, I want to thank everyone for the excellent discussions and insights, even though I was not planning to add overfitting into the discussion, I am very grateful so many did, even though I have seen (in my time here), that most all new users here are having a problem with underfitting.

The same type of trade-off exists in most engineering models, especially in detection theory where we either see too many “false positives” or “false negatives”. For example, in my younger days I performed at lot of cybersecurity consulting for big banks in NYC. One of the banks had spent a small fortune developing a fraud / malicious use detection engine (before my time there) and when they finally implemented it, the customers went crazy when the system kept locking them out of their accounts, and the bank CEO ordered the system disabled after a very short time. The system was too aggressive and the results was, of course, too many “false positives”.

So, I don’t think the myriad novice developers who come here enamored with generative AI are going to understand this ML trade-off of overfitting v. underfitting. Many here have been very specific that they want quite exact “lookup table” types of responses in their chatbot projects and they complain when they cannot get the exact completions from their prompts, and they give up or perhaps (and correctly) move to embeddings.

I wrote this tutorial to help address those many users who have been “tossing their hands up” in frustration that they keep getting “garbage” when they fine-tune because, as mentioned, I have not seen users in this community complaining (after spending a lot of time on the site) they are seeing too tight of a fit.

Thanks for keeping the signal-to-noise ratio high everyone!



Continued here:


@ruby_coder: I want to thank you a lot for this tutorial and answering questions. Helps me a lot. All the best!


Hi @mba

Thank you and so glad you learned a lot!


1 Like

Quick question, when does it make to fine tune a GPT3 model? Few reasons I could think of:

  1. if you want to digest a particular knowledge base (in that case you probably want to keep the temperature low as you would want the answers closer to what you knowledge base is)
  2. if you want to learn a certain style of writing

There must be other reasons as well for why you might to finetune ChatGPT, thus would be curious to hear your thoughts.


Sorry for dumb question but how do I change the n_epochs value when fine tuning? I do not have the UI you are using and did my fine tuning through cmd.

1 Like

The answer to your question fully depends on the software (API wrapper) you are using when you call the fine-tuned create API endpoint:

See Also:

OpenAI API Fine-Tunes Create

1 Like

These models are language models. So, you want to fine-tune to increase the probability that the model will generate completions based on your input text to match the output text as you desire.

The above is going to take a huge amount of training data and don’t think you will easily fine-tune a model to mimic a writing style which is not germane to the base model.

You are basically saying, “take the entire internet and re-weigh the underlying model / data in a way that when it generates a completion, it will generate it in the way in which I want”. Of course, the more narrow your subject, the less fine-tuning trained data it will require.

Hope this helps.

1 Like

I was following this github page GitHub - daveshap/FinetuningTutorial: Finetuning tutorial for GPT-3
So if i want 36 epochs I will need to run the training on the same model multiple times?

1 Like

n_epochs is a parameter when you create a fine-tuned model:

1 Like

I cant find the parameter in any of the scripts for fine tuning, any chance you could link one? My fine tuned model just repeats the prompt i give it and I don’t know how to change the epochs

1 Like

Here ya go… here is an example method in Ruby:

    def self.finetune(file_id,model="davinci",n_epochs=4)
        if file_id.present?
            client = get_client
            response =client.finetunes.create(
                parameters: {
                        training_file: file_id,
                        model: model,
                        n_epochs: n_epochs.to_i,
            fine_tune_id = JSON.parse(response.body)["id"]
            return fine_tune_id
            return nil

Sorry I have been trying to figure out how to use what you sent with the script I fine tuned with originally but do not know how to add it, I have only been coding for 2 weeks and am massively struggling. I found this code that has the epoch variables, can you explain which parts I need to change in this script to beable to run it please?

import openai
import requests

openai.api_key = “INSERT_YOUR_API_KEY_HERE”

def fine_tune_model(prompt, dataset, model_engine=”davinci”, num_epochs=3, batch_size=4):

  headers = {
   “Content-Type”: “application/json”,
   “Authorization”: f”Bearer {openai.api_key}”,
  data = {
   “model”: f”{model_engine}-0",
   “dataset”: dataset,
   “prompt”: prompt,
   “num_epochs”: num_epochs,
   “batch_size”: batch_size
  url = “"
  response =, headers=headers, json=data)
  if response.status_code != 200:
   raise ValueError(“Failed to fine-tune the model.”)
  # Get the ID of the fine-tuned model
   model_id = response.json()[“model_id”]
   return model_id
1 Like