I have followed the documentation and the cookbook to structure my jsonl file like so:
{"messages": [{"role": "system", "content": "You are a helpful assistant, expert in Deno's package management features."}, {"role": "user", "content": "Does Deno 2 support `package.json` files?"}, {"role": "assistant", "content": "Yes, Deno 2 provides native support for `package.json` files. This allows you to define your project's dependencies, scripts, and other metadata in a familiar format, enhancing compatibility with existing Node.js projects."}]}
I have 3700 such lines.
Then why do I get ERROR in necessary_column validator?
prompt would be a required input validated for fine-tune on a completions model such as davinci-002 or other completions models, all retired. The file validation first step might not be aware that submitting to them is now impossible.
messages format is indeed expected on chat models.
You
{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant, expert in Deno's package management features."
},
{
"role": "user",
"content": "Does Deno 2 support `package.json` files?"
},
{
"role": "assistant",
"content": "Yes, Deno 2 provides native support for `package.json` files. This allows you to define your project's dependencies, scripts, and other metadata in a familiar format, enhancing compatibility with existing Node.js projects."
}
]
}
Example
{
"messages": [
{
"role": "system",
"content": "Marv is a factual chatbot that is also sarcastic."
},
{
"role": "user",
"content": "What's the capital of France?"
},
{
"role": "assistant",
"content": "Paris, as if everyone doesn't know that already."
}
]
}
No problems with what you are sending, as long as JSONs are all separated by a single linefeed.
So I would check the model in your API fine-tuning call, and send the exact model name from those supported:
A model must be specified. Lots of code out there is obsolete.
This other topic at 13 days ago has bespoke Python code for uploading, initiating fine-tune, and monitoring train progress.
It coincidentally uses the same model.
Your own .jsonl file name, and a custom short prefix for model name of your own, shall be edited in.
Epochs parameter is 1 for a “light” training at lowest expense, while 3-5 is a good starting point (if not deleting that parameter to allow OpenAI to decide how much money to spend.)
A helpful AI model or OpenAI’s quickstart can tell you how to prepare a Python execution environment, with your OPENAI_API_KEY stored as an environment variable and automatically used.
You have uploaded a file, it seems, obtaining the file ID.
The API Reference (link on side of forum) is your source for up-to-date information. From there, your example on how to fine tune now that you have uploaded a file:
That will just start the process, with no updates.
You can also just use the https://platform.openai.com/finetune link if you’d like a graphic web interface to start the fine tune job and monitor progress.
A validation file is not for checking your data uploaded.
It is a special training file that looks like your JSONL, but it has held-out questions as a test of the quality of learning. It is a second file input to the fine-tuning endpoint.
You can see how well the AI has learned when it gets other questions it is expected to respond to just as well. The AI quality on these alternate questions will be plotted for you if you use the web interface.
A validation file is not required, as it just provides more information about the training process, and requires developing similar quality questions that do not improve the model itself.
You can estimate 1 token per four characters of total input of AI English language inside the JSON. Add another 12 tokens per line of unseen control tokens for the three messages.
Sum all the lines.
Multiply by the epochs hyperparameter.
Something that can calculate and actually encodes tokens is better.