`prompt` column/key is missing yet I use correct jsonl for fine tunning

fredkzk · October 31, 2024, 11:59pm

I have followed the documentation and the cookbook to structure my jsonl file like so:

{"messages": [{"role": "system", "content": "You are a helpful assistant, expert in Deno's package management features."}, {"role": "user", "content": "Does Deno 2 support `package.json` files?"}, {"role": "assistant", "content": "Yes, Deno 2 provides native support for `package.json` files. This allows you to define your project's dependencies, scripts, and other metadata in a familiar format, enhancing compatibility with existing Node.js projects."}]}

I have 3700 such lines.
Then why do I get ERROR in necessary_column validator?

_j · November 1, 2024, 1:48am

prompt would be a required input validated for fine-tune on a completions model such as davinci-002 or other completions models, all retired. The file validation first step might not be aware that submitting to them is now impossible.

messages format is indeed expected on chat models.

You

{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant, expert in Deno's package management features."
    },
    {
      "role": "user",
      "content": "Does Deno 2 support `package.json` files?"
    },
    {
      "role": "assistant",
      "content": "Yes, Deno 2 provides native support for `package.json` files. This allows you to define your project's dependencies, scripts, and other metadata in a familiar format, enhancing compatibility with existing Node.js projects."
    }
  ]
}

Example

{
  "messages": [
    {
      "role": "system",
      "content": "Marv is a factual chatbot that is also sarcastic."
    },
    {
      "role": "user",
      "content": "What's the capital of France?"
    },
    {
      "role": "assistant",
      "content": "Paris, as if everyone doesn't know that already."
    }
  ]
}

No problems with what you are sending, as long as JSONs are all separated by a single linefeed.

So I would check the model in your API fine-tuning call, and send the exact model name from those supported:

gpt-4o-2024-08-06
gpt-4o-mini-2024-07-18
gpt-4-0613 (required prior approval)
gpt-3.5-turbo-0125
gpt-3.5-turbo-1106
gpt-3.5-turbo-0613

https://platform.openai.com/docs/guides/fine-tuning

fredkzk · November 1, 2024, 7:29am

Then if I get a ‘prompt’ error message, my CLI command must be wrong?

openai tools fine_tunes.prepare_data -f Deno2-dataset.jsonl

What is the command for preparing (upload the file) then fine tunning?
(I’m using gpt-4o-mini-2024-07-18)

_j · November 1, 2024, 7:50am

A model must be specified. Lots of code out there is obsolete.

This other topic at 13 days ago has bespoke Python code for uploading, initiating fine-tune, and monitoring train progress.

It coincidentally uses the same model.

Your own .jsonl file name, and a custom short prefix for model name of your own, shall be edited in.

Epochs parameter is 1 for a “light” training at lowest expense, while 3-5 is a good starting point (if not deleting that parameter to allow OpenAI to decide how much money to spend.)

A helpful AI model or OpenAI’s quickstart can tell you how to prepare a Python execution environment, with your OPENAI_API_KEY stored as an environment variable and automatically used.

fredkzk · November 1, 2024, 8:22am

I don’t use the python code but I figured it out with this command line:

curl https://api.openai.com/v1/files \ -H “Authorization: Bearer [my api key]” \ -F “file=@Deno2-dataset.jsonl” \ -F “purpose=fine-tune”

And got the response:

{                                                                                                                                                                                          "object": "file",
"id": "file-0tN3i8Vd2ogq1Ixb71Ag",
"purpose": "fine-tune",
"filename": "Deno2-dataset.jsonl",
"bytes": 3457949,
"created_at": 1730446749,
"status": "processed",
"status_details": null
}

curl: (3) URL rejected: Bad hostname

_j · November 1, 2024, 8:34am

You have uploaded a file, it seems, obtaining the file ID.

The API Reference (link on side of forum) is your source for up-to-date information. From there, your example on how to fine tune now that you have uploaded a file:

curl https://api.openai.com/v1/fine_tuning/jobs \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "training_file": "file-0tN3i8Vd2ogq1Ixb71Ag",
    "model": "gpt-4o-mini",
    "hyperparameters": {
      "n_epochs": 2
    }
  }'

That will just start the process, with no updates.

You can also just use the https://platform.openai.com/finetune link if you’d like a graphic web interface to start the fine tune job and monitor progress.

fredkzk · November 1, 2024, 8:41am

Any idea how to run a validation file before the fine tune job?

_j · November 1, 2024, 8:47am

A validation file is not for checking your data uploaded.

It is a special training file that looks like your JSONL, but it has held-out questions as a test of the quality of learning. It is a second file input to the fine-tuning endpoint.

You can see how well the AI has learned when it gets other questions it is expected to respond to just as well. The AI quality on these alternate questions will be plotted for you if you use the web interface.

A validation file is not required, as it just provides more information about the training process, and requires developing similar quality questions that do not improve the model itself.

fredkzk · November 1, 2024, 10:05am

Do you know how I can assess the fine tunning cost beforehand with my file?
… other than running a python script.

_j · November 1, 2024, 4:15pm

You can estimate 1 token per four characters of total input of AI English language inside the JSON. Add another 12 tokens per line of unseen control tokens for the three messages.

Sum all the lines.

Multiply by the epochs hyperparameter.

Something that can calculate and actually encodes tokens is better.

Topic		Replies	Views
Fine tuning issue in playground API fine-tuning , fine-tuning-problems	4	467	November 13, 2023
"Error creating job: Your organization must qualify for at least usage tier 4 to fine-tune gpt-4o-mini-2024-07-18. " API fine-tuning	11	1305	November 15, 2024
Doesn't Understand fine tuned model cost API	13	7432	June 30, 2024
CLI Fine-Tune Error: Hard Billing Limit Exceeded API	9	2115	May 17, 2023
New Fine-tuning billing - API api	21	1828	August 29, 2023

`prompt` column/key is missing yet I use correct jsonl for fine tunning

You

Example

Related topics