Yes, this JSON line entry (above) does not meet the OpenAI fine-tuning guidelines.
It seems almost everyone here skips reading this part of the docs, not only you @madhangopal500 ![]()
Yes, this JSON line entry (above) does not meet the OpenAI fine-tuning guidelines.
It seems almost everyone here skips reading this part of the docs, not only you @madhangopal500 ![]()
Entire json line format has taken from Open AI documentation. I’ll re-check all JSON line.
Thanks for your help ![]()
Yes, that line is early in the document. That’s the problem with beta documentation. You are in the huge majority who stops at that example in the docs and missed the details posted above.
I have a working JSONL and OpenAI API fine-tuning validator and ran your JSONL line, and it passes (obviously) JSONL but fails the OpenAI “fine-tuning format” validator:
Could you share me any valid format which is accepted by Open AI?
That would be helpful for me.
I think something like this might work for you (not tested):
{"prompt":"Item=handbag, Color=army_green, price=$99, size=S-> MY_SEPARATOR", "completion":" This stylish small green handbag will add a unique touch to your look, without costing you a fortune. STOP_STOP"}
You must uses your separator and your stop when you query your fine-tuned model.
I think it is pretty clear from the directions below. Just follow the directions ![]()
The issue is that you expect to feed the API one of your prompts and are expecting it to find the matching completion.
Unfortunately, it doesn’t work like that.
The fine-tuning changes the bias and establishes patterns for completions. It won’t work as a lookup table.
I also read that it doesn’t like lists of parameters. It prefers things in sentences.
Eg.
“An army green small handbag priced at $99”
will perform much better than
“Item=handbag, Color=army_green, price=$99, size=S”
But even with that change, it is not an exact science.
Essentially, you have to overwhelm the AI with patterns that it can follow. They could be lookup lists, but you will not get the matching completion for a given prompt. You would have to feed lots of examples for it to start to produce something similar.
@raymonddavey is 100% correct so even when you get all the formatting, separators and stops right, there are the challenges which @raymonddavey mentioned on top of all that.
Personally, I would not use fine-tuning for your application as you have shared; but since you only provided one example line of your JSONL file, I assumed you have many hundreds of training lines. Your choice of tech is different than mine, but you learn by experimenting!
So, go for it as you please. Experts are made my making a lot of mistakes or going down paths which lead to the wrong results; and finally “getting it right”.
How many lines is your JSONL training file, BTW?
@ruby_coder as of now I’m doing R&D so testing with two to three lines of JSON line. I’ll consider @raymonddavey points and work on it.
I’ll keep posted ![]()
You won’t get a fair indication of if it is going to work with only 2 or 3 lines.
You need at least 100. But if you can’t do that, try 20 rows and increase n_epochs
n_epochs tell GPT to process the file multiple times. This reinforces learning.
Yeah, that’s the fun part. Enjoy and report back with results, good and bad.
Hey @madhangopal500
FYI, in case you miss the docs on this, the n_epochs parma defaults to 4. See:
See:
I think making sure you have your SEPARATOR and STOP set correctly in the training data and then using these these same strings in your prompts after tuning should help a lot, as the docs instruct.
I just tested your JSONL line @madhangopal500 against my validator for both JSONL and the API, as well:
Validated OK and got a ID no problem at all.
I’m still working on adding more params to the fine-tune method, and may move the validator out completely as a separate function, add a second column for API valid?, etc… It’s still a WIP.
HTH
It’s still in working progress @ruby_coder.
could you share the link? where you tested fine tuning validation
that will helpful for me ![]()
It runs on localhost on my desktop on the seacoast.
I have not yet pushed this code to the net, sorry. I am still adding more and more functionality and adding more params, etc. I need to move some functions like files and validation into different tasks and modules, etc. Plus, I have other tasks on my plate, so I work on this over AM coffee and PM dinner at my desk, haha.
Do you use REGEX expressions @madhangopal500 ?
I validate with REGEX expressions as you have surely guessed by now
.
Hi guys,
here I 'm not getting proper response on keywords of given prompt.
{"prompt":"Item=handbag, Color=army_green, price=$99, size=S->","completion":" This stylish small army green handbag will add a unique touch to your look, without costing you a fortune.###"}
Scenario:
prompt: small size army green handbag
Expected Output: This stylish small army green handbag will add a unique touch to your look, without costing you a fortune.
Output on OpenAI : with a small black design on the front. The small black design is a small black and white small diamond pattern, and the small black design covers…
help me here to find solution.
thanks
Your training data needs to also be in words. It doesn’t like captions with values.
For example it doesn’t correlate S as meaning Small. It works much better if you convert your training data into sentences instead of a list of parameters or specifications
@raymonddavey I have created training data as per documentation. In documentation it’s also support captions with values.
This is from the documentation
For inference, you should format your prompts in the same way as you did when creating the training dataset
Your training set uses parameters and the prompt when you use it is using sentences or words
Also quoting from the documentation:
Here it is important to convert the input data into a natural language, which will likely lead to superior performance. For example, the following format:
{"prompt":"Item=handbag, Color=army_green, price=$99, size=S->", "completion":" This stylish small green handbag will add a unique touch to your look, without costing you a fortune."}
Won’t work as well as:
{"prompt":"Item is a handbag. Co
They got rid of the equal signs and commas in the list format and made sentences as prompts instead
This documentation you have posted is not correct according to OpenAI Fine-Tuning Data Formatting documentation.
Please follow the guidelines I have posted above and below, again.
The last part was directly copied from this link and anchor
The first part of my reply was copied directly from the 4th bullet point under this link