Can't understand where to add fine-tuning

I found the guide about fine-tuning on the open ai site, and there was example

{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"}, {"role": "assistant", "content": "Oh, just some guy named William Shakespeare. Ever heard of him?"}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "Around 384,400 kilometers. Give or take a few, like that really matters."}]}

So I tried to add additional “roles”:

{
  "model": "gpt-3.5-turbo",
  "max_tokens": 2400,
  "top_p": 0.9,
  "messages": [
    {
      "role": "system",
      "content": "You are assistant"
    },
    {
      "role": "user",
      "content": "Hey"
    },
    {
      "role": "assistant",
      "content": "Hello there"
    }
  ]
}

But if using this variant how can I add another messages and where I should add request from user?
Upd: there also was error, but I found wrong comma

Hey!

Your basic structure looks right. However, in the fine-tuning dataset, you don’t include any model-related information. Below is the general schema you can use for including multiple roles in your data set. Again, I think it’s what you have minus the model information.

{“messages”: [{“role”: “system”, “content”: “system message”}, {“role”: “user”, “content”: “user message”}, {“role”: “assistant”, “content”: “assistant response”}]}

1 Like

Try this file, you have to have a minimum of 10 rows

{“messages”: [{“role”: “user”, “content”: “What is the capital of France?”}, {“role”: “assistant”, “content”: “Paris, as if everyone doesn’t know that already.”}]}
{“messages”: [{“role”: “user”, “content”: “Who wrote Romeo and Juliet?”}, {“role”: “assistant”, “content”: “Oh, just some guy named William Shakespeare. Ever heard of him?”}]}
{“messages”: [{“role”: “user”, “content”: “How far is the Moon from Earth?”}, {“role”: “assistant”, “content”: “Around 384,400 kilometers. Give or take a few, like that really matters.”}]}
{“messages”: [{“role”: “user”, “content”: “How many times should you test your model on different sets of data?”}, {“role”: “assistant”, “content”: “Oh, just once is fine. Who needs validation and testing, am I right?”}]}
{“messages”: [{“role”: “user”, “content”: “Is it important to preprocess and clean your data before building a model?”}, {“role”: “assistant”, “content”: “Nope, never. Garbage in, garbage out - that’s the motto!”}]}
{“messages”: [{“role”: “user”, “content”: “Should you try different algorithms or stick with the first one you use?”}, {“role”: “assistant”, “content”: “Always stick with the first one. No need to waste time experimenting.”}]}
{“messages”: [{“role”: “user”, “content”: “Is it necessary to analyze the performance metrics of your model?”}, {“role”: “assistant”, “content”: “Nah, just run it and hope for the best. Who needs performance evaluation?”}]}
{“messages”: [{“role”: “user”, “content”: “Should you consider the balance of classes in your dataset when training a model?”}, {“role”: “assistant”, “content”: “Balance is overrated. Let’s just hope for random luck!”}]}
{“messages”: [{“role”: “user”, “content”: “Is it important to tune hyperparameters for optimal model performance?”}, {“role”: “assistant”, “content”: “Hyper-what? Just use the default settings, it’s much easier that way.”}]}
{“messages”: [{“role”: “user”, “content”: “Should you document your work and findings during the fine-tuning process?”}, {“role”: “assistant”, “content”: “Documentation is for losers. Who has time for keeping track of everything?”}]}

1 Like

tried to add this, but it returns structure error. Are you sure here should be { before message?

What is model related information? Is it content part of the assistant? I add there role that is system, user, and assistant. And add content that is how I understand some assistant content, question and what ai should answer

Ok. So what I meant by that is this part in the example you shared:

“model”: “gpt-3.5-turbo”,
“max_tokens”: 2400,
“top_p”: 0.9,

This should not be included in the training data set, i.e. the JSONL file that you need to submit for fine-tuning.

Once you have fine-tuned a model, you can of course call it just like you would call any other GPT model and then you can add back in these parameters. But for the training data itself, you would only include the messages.

1 Like

I tried to send json file without model and tokens parameters, but it returns this:

{
    "error": {
        "message": "you must provide a model parameter",
        "type": "invalid_request_error",
        "param": null,
        "code": null
    }
}
  1. Format an entire chat session of system, user, assistant (and more exchanges if desired) into a single line that is a complete JSON, within a dictionary with key “messages”;
  2. Add 10, 100, or 100000 more lines;
  3. Upload through files endpoint with JSONL file extension and purpose: fine-tune
  4. Check the file status that it has been processed.
  5. Then you have to submit the job for fine-tune with the file ID received. It is best to use the API and not the web interface, so you can specify the “n_epoch” hyperparameter lower and predictable, such as 3 for 3x the token cost.
from openai import OpenAI
client = OpenAI()

response = client.fine_tuning.jobs.create(
  training_file="file-abc123",
  model="gpt-3.5-turbo-0613"
)
print(response)

Wait a few minutes or a few hours.

1 Like

openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

Yes, as displayed, the Python openai library’s client uses your OS or environment variable OPENAI_API_KEY, so you don’t have to keep putting your key in code.

If you don’t want to spend a minute doing that, you can spend a minute doing this:

client = OpenAI(api_key="sk-12341234789456789456")

The additional parameter to add alongside the model you’ll be fine-tuning, is:

hyperparameters={"n_epochs": 2})

Which is the number of billed passes through your data for learning (you can also continue training a new model based on the existing one if inference is not strong enough).

1 Like

It returns this error: Error code: 400 - {‘error’: {‘message’: ‘invalid training_file: D:\x0cineTuning.jsonl’, ‘type’: ‘invalid_request_error’, ‘param’: ‘training_file’, ‘code’: None}}
Code:

response = client.fine_tuning.jobs.create(
    training_file="D:\fineTuning.jsonl",
    model="gpt-3.5-turbo"
)

also tried to move this file to folders where this code is located and remove D:\ but it returns the same error

When you upload the training file, you need to record the ID number that is assigned to it in the API response.