A question regarding fine-tuning

I am structuring my training data. The user message contains the question and the prompt (the questions is about the the prompt), the assistant message is examples of how I would answer that question for that prompt.

In the documentation, it is mentioned “We generally recommend taking the set of instructions and prompts that you found worked best for the model prior to fine-tuning, and including them in every training example.”, my question is where should I add the instruction when I am preparing this examples? If anyone has a link to a good example, it would be very helpful. The openai example is very simple and straightforward.

Welcome to the community.

So, in the above description, ‘instructions’ means ‘prompts’.

You are on the right track.

Based on your explanation, here is an example for a simple completions fine tuning:

{“prompt”: “Question + Prompt”, “completion”: “Answer”}

Hope this helps. I will keep an eye on this for the next day to see if you have any follow up questions.

1 Like

Thank you so much for the prompt reply. But just to clarify this further for myself, here is how I construct the messages when I am making a call to the API (just the gpt-turbo model, no fine-tuning):

messages = [
{“role”: “system”, “content”: “You are a helpful assistant.”},
{“role”: “user”, “content”: prompt},
{“role”: “assistant”, “content”: instruction},
]

The prompt is the same prompt I am using to construct my examples for fine-tuning, the instruction though is a long paragraph explaining to the chatgpt what to do.

In creating examples for fine-tuning, however, instead of instructions, I am providing examples of how to do the task.

Is there any benefit in adding the instruction to the prompt (if that’s the right place to add it)? Is there any harm to the fine-tuned model if I remove the instructions completely? Are these examples of “how to do the task” actually acting as if the instructions were provided? Any discussion would be helpful. Since fine-tuning is not cheap, I am trying to avoid trial and error as much as I can.

Answer below in italics

messages = [
{“role”: “system”, “content”: “You are a helpful assistant. <your instruction should go here - what do you want the AI to do, this should be the same for every entry of your JSONL file. This should ideally contain the instructions on what the AI has to do when given a prompt to get the answer>”},
{“role”: “user”, “content”: prompt},
{“role”: “assistant”, “content”: instruction <remove instruction here. ideally, here, you just give it the output that you want. So just add the answer to your question here. So this becomes Answer>},
]

For ease, I have found that its easier to do a smaller sample run to see if you are starting to get the right result. A small sample size of 30 to 50 is good enough for this and should be quite cheap.

This makes a lot of sense now. Thank you so much. Yes my plan is to start with a smaller training set and go from there.

Thank you again for your information.

I was wondering about the max token limit of different models that are finetune-able. Here is what I found in the documentation: “Token limits depend on the model you select. For gpt-3.5-turbo-0125 , the maximum context length is 16,385 so each training example is also limited to 16,385 tokens. For gpt-3.5-turbo-0613 , each training example is limited to 4,096 tokens.” my examples are each around 23000 tokens. I was wondering 1) what is the max token limit on the other models? 2) Are there any tricks or tips that anyone can suggest to make the fine-tuning work without shortening the example too much?

Hey @somayeh.molaei

a) that is a lot of tokens for fine tuning :slight_smile:

That being said, could you do the following:

  1. reduce the tokens by removing stop words (apparently don’t have any impact, but I could be wrong here): Removing stop words with NLTK library in Python | by Banjoko Judah | Analytics Vidhya | Medium

And remember: you are training the AI on how to complete a context based on another context, where you can show various lengths of conversation depth and injection of knowledge and the output that utilizes that injection correctly.

You are making examples expected to be seen in actual use of the application. The AI will infer adjacent user inputs to mean similar response completion.

Therefore, a training context length greater than the model has little value. A context that is your product list and then the AI not answering about that doesn’t train much. The AI answering to a user input that no user would ever type also doesn’t put the fine-tune model on a path to making useful output. You would not train on a context length the AI will never see in practice.

So to review the prior points visited that can inform the desire to place larger context: a training file example:

system: (brief identity particular to this AI you will also use in application. The idea is that you don’t need a massive amount of text explaining the AI operation, because the AI learns the desired operation by examples)
user: past input context
assistant: past answer context
user: current user input like the user input or instruction programming of your application
assistant: (all varieties of a new type of output you want to teach, that show the AI has a special skill or behavior that is not or cannot be instructed by system prompt programming)

1 Like

Yeah but there are use cases that don’t involve conversations.

For example, if I want the model to reason over a document and then create an output based on that, it would not be overly useful or a good depiction of reality if I just included half of the document as the training example.

So I do see value in larger context windows for fine-tuning depending on the use case at hand.