I think something like this might work for you (not tested):
{"prompt":"Item=handbag, Color=army_green, price=$99, size=S-> MY_SEPARATOR", "completion":" This stylish small green handbag will add a unique touch to your look, without costing you a fortune. STOP_STOP"}
You must:
Add a separator at the end of your prompt
Add a stop at the end of your completion
Have a white space at the beginning of your completion.
You must uses your separator and your stop when you query your fine-tuned model.
I think it is pretty clear from the directions below. Just follow the directions
But even with that change, it is not an exact science.
Essentially, you have to overwhelm the AI with patterns that it can follow. They could be lookup lists, but you will not get the matching completion for a given prompt. You would have to feed lots of examples for it to start to produce something similar.
@raymonddavey is 100% correct so even when you get all the formatting, separators and stops right, there are the challenges which @raymonddavey mentioned on top of all that.
Personally, I would not use fine-tuning for your application as you have shared; but since you only provided one example line of your JSONL file, I assumed you have many hundreds of training lines. Your choice of tech is different than mine, but you learn by experimenting!
So, go for it as you please. Experts are made my making a lot of mistakes or going down paths which lead to the wrong results; and finally “getting it right”.
I think making sure you have your SEPARATOR and STOP set correctly in the training data and then using these these same strings in your prompts after tuning should help a lot, as the docs instruct.
I’m still working on adding more params to the fine-tune method, and may move the validator out completely as a separate function, add a second column for API valid?, etc… It’s still a WIP.
It runs on localhost on my desktop on the seacoast.
I have not yet pushed this code to the net, sorry. I am still adding more and more functionality and adding more params, etc. I need to move some functions like files and validation into different tasks and modules, etc. Plus, I have other tasks on my plate, so I work on this over AM coffee and PM dinner at my desk, haha.
Hi guys,
here I 'm not getting proper response on keywords of given prompt.
{"prompt":"Item=handbag, Color=army_green, price=$99, size=S->","completion":" This stylish small army green handbag will add a unique touch to your look, without costing you a fortune.###"}
Scenario: prompt: small size army green handbag Expected Output: This stylish small army green handbag will add a unique touch to your look, without costing you a fortune. Output on OpenAI : with a small black design on the front. The small black design is a small black and white small diamond pattern, and the small black design covers…
Your training data needs to also be in words. It doesn’t like captions with values.
For example it doesn’t correlate S as meaning Small. It works much better if you convert your training data into sentences instead of a list of parameters or specifications
For inference, you should format your prompts in the same way as you did when creating the training dataset
Your training set uses parameters and the prompt when you use it is using sentences or words
Also quoting from the documentation:
Here it is important to convert the input data into a natural language, which will likely lead to superior performance. For example, the following format:
{"prompt":"Item=handbag, Color=army_green, price=$99, size=S->", "completion":" This stylish small green handbag will add a unique touch to your look, without costing you a fortune."}
Won’t work as well as:
{"prompt":"Item is a handbag. Co
They got rid of the equal signs and commas in the list format and made sentences as prompts instead
That documentation is in conflict with OpenAI’s stated directions which I have posted so many times.
Posting incorrect docs in conflict with these guidelines is not always in the best interest of beginning users with problems, in my view. OpenAI needs to fix their docs so they are consistent, or users will keep being frustrated their fine-tunes do not work as they expect.
Do you not agree with these OpenAI data formatting requirements below:
I has posted countless times on the need to validate JSONL date for both JSONL format requirements and the OpenAI Data Formatting requirements; and have written a validation for both with work for all my fine-tunings without a problem.
We need to encourage users to validate the JSONL data against OpenAI Data Formatting guidelines in my view as the first step they must do when trouble shooting fine tuning issues.
OK, I am going to test this for you. The cake is baking now as a single-line JSONL test and when it bakes, I’ll test your prompt for you and post back.
If we given exact prompt we are getting expected result, but in my case I’m going to give keywords of those prompt like small size army green handbag, the expected output is This stylish small army green handbag will add a unique touch to your look, without costing you a fortune. but we are getting random results from OpenAI
Are too dissimilar (you can test by taking the dot product between two embedding vectors of both strings) and it may be difficult to get the correct model fitting when you fine-tune, unless you add “string 2” to your fine-tuning JSONL file as a prompt.
However, if you shift to using embeddings, you may have better luck.
Before I drop off, I will run this again with only 12 n_epochs but I serious doubt it will provide a suitable model fitting for both strings with a single-line JSONL entry.
In addition, your string lengths are a bit short, so this makes the vector math even more challenging; it’s doable, but requires work on your part.