Fine Tuning - Should we have a file for each style prompt?

Hello, everyone, I just want to make sure I dont make a mistake here. I am working on a marketing solution, lets say I want to have the capability to:
1.Create a value proposition
2.Create a product description
3.Create a ad
4. Create product name ideas

Would I have a file of examples for each one, which I would then reference later? If yes, are there any limits on how many files? Is it the limit of 10 fine-tuning runs per month, meaning 10 different use cases?

Any help is greatly appreciated, thank you.

PS: I literally asked if this was possible last week and then Openai came out with this. Im in love! <3


Yes, you would want a fine-tuned model for each one. So this would just be 4 out of your 10 monthly runs, so you would not run into any limits. Keep in mind, though, that you need a few hundred examples for each. I am working on my own first runs but I am aiming for 1000 samples per fine-tuned model, so it’s taking some time. In my case, I am using DAVINCI-INSTRUCT-BETA as well as some raw source data to generate my samples, and then cleaning up by hand. I wrote about it here: What are your favorite text-based dialog datasets?


Feel free to reach out if the 10 runs become a blocker this month. We can work with you to get the most out of those runs or otherwise help you sort through a solution. We’ll be thinking about how to balance encouraging experimentation while also managing our costs. More to come!


I’m going to test a similar use case and try a single file for fine-tuning.
The idea would be to implement in the same file some different behaviors with different prompts and replies. The challenge to achieve the specific desired behavior isn’t small since the model will have different potential replies. However, we’ll try to direct it to a specific use case with a short prompt before completion.
For example, “An assistant AI which does X,” “An ad generation tool,” etc. I’m optimistic about it because I could merge different behaviors using the semantic search + completions + a short prompt.
Within the semantic search content file, I’ve implemented various behaviors, so when the AI picked a paragraph out, it included instructions for how to reply to the initial prompts.
It basically worked fine, but I found myself building very long prompts for complex tasks that eventually make it expensive. Therefore, the potential for implementing the same concept within fine-tuning seems intriguing.
In any case, a combination of the two (semantic search + completions based on a fine-tuned model) might bring the best results.
The concept to avoid that would be to train the model with many different behaviors through prompts + replies and then to have a short prompt + example before completion. It will be costly if you’ll have to use many examples in the prompt since the engine is more expensive.
Still, I tend to believe that a single example in the prompt + the knowledge achieved in the fine-tuning has much potential in enabling the use case more precise and usable manner.


I was going to post a question about fine tuning, but maybe I can find my answers here. Is there an example of the data used in a fine tuning?

Some examples here: OpenAI API. Or are you looking for a full dataset?

1 Like

The amount of generalization you can get across related tasks is an interesting question and I don’t have a concrete answer. Related to the amount and quality of the data. @boris might have a more nuanced take.


This is an interesting approach. You can keep the starting prompts short - no need to say “an assistant AI which does X” - you can just instead use “X\n”. Given hundreds of examples per each separate task you’re likely to see an improvement by training a single model which does all task vs a separate model per each task. We haven’t experimented with this very much at the moment, so I’d be curious if you find that using multiple different tasks within the same fine-tuning job improves the performance!

With regards to hyperparameters we’ve tried to choose most robust and well performing parameters across a large number of cases - I wouldn’t expect you to need to spend many runs with different hyperparameters to improve the performance.


Thanks for the tip, I’ll update about the results.

@boris, I’m glad to say that it works pretty great. Therefore it seems that we don’t have to use a different fine-tuning file for each behavior.
I’ve embedded two different behaviors in the same fine-tuning file, and it can reply correctly based on the prompt provided.
The amount of examples that I have currently used isn’t substantial, about a hundred each. Still, I included some prompts that mimic the few shot behavior, as well as long background prompts to support the model’s general education that tells the story and explains the expected replies, so that might have helped (although I haven’t tested it without it so I can’t tell for sure).
However, there is a bug that at the end of each completion, it is adding the word END several times, probably to fill in the maximum tokens. While preparing the data set file using the CLI data preparation tool, I accepted adding the END word at the end of each completion. Although within the file it seems OK, the actual completions look like this: “[complition] END END END END END.”
Do you have an idea on how to solve it, or should I delete the END word from the data set and fine-tune it again?



You should use stop=[“END”] parameter when calling the completion endpoint, which saves you a few tokens, and stops the first time it encounters the END token.


Got it, thanks. It’s working great now.

1 Like

Following great questions and replies here, our use case has a user-specific and for each user domain-specific data but catering to one main behavior, e.g., sales and marketing-oriented content. The first question is how to make a difference between domain and behavior? Second, if we need to train the model customer-specific (multiple domain one behaviour), do we need to create multiple data sets and runs? If we combine the data set with one big file (all domains and one behavior), will it mix the domain? As I understand, the main purpose of the fine-tuning is to teach the model behavior and expected replies for each question type regardless of the domain. The model is already generalized and well trained on the whole global internet data…what we are doing in fine-tuning is to teach it a specific set of behavior for your end-goal…be it, let’s say, product descriptions or market research, or sales-oriented content creation.

Am I right in the above assumptions? Do we need one single run (multiple domain + single behavior)…if not, how to deal with situations where the model needs to be specific to different customer data. The semantic search + short prompt works quite fine in our case…now we are planning to finetune on the examples drawn from all customers. Thanks in advance for your reply.

@harish.kumar hi, I can only speak out of my own experience of only a few days.
You need to provide the model a direction of how to treat your request. If the use-cases or behaviors are very similar, then think of some identifiers to help the model understand which data you fed is relevant.
In the fine-tuning file, you can add to that dataset’s prompts unique elements like a headline, an “x:” and such. And then, with your real-time prompts, use the same identifier to direct the model to the relevant dataset.
You can use your structure with the semantic search exactly for that purpose. Use Semantic Search to identify the use case and then inject the identifier to the prompt to direct the model to the relevant completion.


Can you share more details on everything you just said? Im very interested in trying this. Do you have any content to read upon to learn more?

Hi @ian, sorry, that’s a bit broad request :slight_smile: but I’m happy to answer specific questions if I can help

No, you can just have one file. For my use case, I ran into a similar issue and discovered that the fine tuned model works very well when you provide all of your prompts together. For example if you want your model to be able to complete all 4 of those specific tasks in sequential order, you would have 200 examples formatted like this:

{"prompt": "Value proposition:  XYZ, Production description: XYZ, Ad: XYZ", "completion": "Product name ideas:  XYZ."}

This is, of course, assuming your use case utilizes each prompt sequentially - i.e. 4. Create product name ideas uses the completions for 1 - 3.

Then you would simply change the start text.