Please post back the first two lines of your JSONL fine-tuning file you used to fine-tune.
If possible, it would be good if you could wrap your data samples (here in the forum) with Markdown triple back ticks so your data and code is more easy to read.
{"prompt":"Item=handbag, Color=army_green, price=$99, size=S->", "completion":" This stylish small green handbag will add a unique touch to your look, without costing you a fortune."}
Text from Human: I want item of handbag and color as army green.
Expected Response is This stylish small green handbag will add a unique touch to your look, without costing you a fortune.
Yes, that line is early in the document. That’s the problem with beta documentation. You are in the huge majority who stops at that example in the docs and missed the details posted above.
I have a working JSONL and OpenAI API fine-tuning validator and ran your JSONL line, and it passes (obviously) JSONL but fails the OpenAI “fine-tuning format” validator:
I think something like this might work for you (not tested):
{"prompt":"Item=handbag, Color=army_green, price=$99, size=S-> MY_SEPARATOR", "completion":" This stylish small green handbag will add a unique touch to your look, without costing you a fortune. STOP_STOP"}
You must:
Add a separator at the end of your prompt
Add a stop at the end of your completion
Have a white space at the beginning of your completion.
You must uses your separator and your stop when you query your fine-tuned model.
I think it is pretty clear from the directions below. Just follow the directions
But even with that change, it is not an exact science.
Essentially, you have to overwhelm the AI with patterns that it can follow. They could be lookup lists, but you will not get the matching completion for a given prompt. You would have to feed lots of examples for it to start to produce something similar.
@raymonddavey is 100% correct so even when you get all the formatting, separators and stops right, there are the challenges which @raymonddavey mentioned on top of all that.
Personally, I would not use fine-tuning for your application as you have shared; but since you only provided one example line of your JSONL file, I assumed you have many hundreds of training lines. Your choice of tech is different than mine, but you learn by experimenting!
So, go for it as you please. Experts are made my making a lot of mistakes or going down paths which lead to the wrong results; and finally “getting it right”.
I think making sure you have your SEPARATOR and STOP set correctly in the training data and then using these these same strings in your prompts after tuning should help a lot, as the docs instruct.
I’m still working on adding more params to the fine-tune method, and may move the validator out completely as a separate function, add a second column for API valid?, etc… It’s still a WIP.
It runs on localhost on my desktop on the seacoast.
I have not yet pushed this code to the net, sorry. I am still adding more and more functionality and adding more params, etc. I need to move some functions like files and validation into different tasks and modules, etc. Plus, I have other tasks on my plate, so I work on this over AM coffee and PM dinner at my desk, haha.
Hi guys,
here I 'm not getting proper response on keywords of given prompt.
{"prompt":"Item=handbag, Color=army_green, price=$99, size=S->","completion":" This stylish small army green handbag will add a unique touch to your look, without costing you a fortune.###"}
Scenario: prompt: small size army green handbag Expected Output: This stylish small army green handbag will add a unique touch to your look, without costing you a fortune. Output on OpenAI : with a small black design on the front. The small black design is a small black and white small diamond pattern, and the small black design covers…
Your training data needs to also be in words. It doesn’t like captions with values.
For example it doesn’t correlate S as meaning Small. It works much better if you convert your training data into sentences instead of a list of parameters or specifications
For inference, you should format your prompts in the same way as you did when creating the training dataset
Your training set uses parameters and the prompt when you use it is using sentences or words
Also quoting from the documentation:
Here it is important to convert the input data into a natural language, which will likely lead to superior performance. For example, the following format:
{"prompt":"Item=handbag, Color=army_green, price=$99, size=S->", "completion":" This stylish small green handbag will add a unique touch to your look, without costing you a fortune."}
Won’t work as well as:
{"prompt":"Item is a handbag. Co
They got rid of the equal signs and commas in the list format and made sentences as prompts instead