System Instructions for Fine Tuning

I’m currently working on a project that involves fine-tuning a language model (LLM) AI for a specialized application. The task requires creating detailed system instructions that go beyond basic personality traits or simple behavior modifications (like “Marv is a salty chatbot”).

I’m looking for advice or best practices on how to effectively structure and articulate these instructions to achieve nuanced and specific outcomes from the AI.

Key challenges include:

  • Ensuring the AI comprehends and adapts to specialized and complex contexts in a specific problem area where all LLM ai have been improperly/poorly trained on the subject matter.
  • Integrating the AI’s learning capability to recognize user sophistication and respond accordingly

If anyone has experience or insights on developing such detailed system instructions, your input would be incredibly valuable. How do you approach the balance between specificity and generalizability? What strategies do you use to ensure the instructions are clear yet comprehensive enough for the AI to execute complex tasks?

So far my system instructions are about 750 tokens but the consultant I am working with seems to think that these system instructions are not important at all.

Thank you in advance for your help!

1 Like

Hey there!

So, a couple things:

1 - fine-tuning an LLM is different than instruction prompting. Are you planning on trying to do both? One or the other?

2 - When it comes to creating a comprehensive prompt, it’s usually done on a case-by-case basis. Each scenario is different, and likewise requires different specificity and generalization requirements. Some people need very little flexibility, others need a lot more. This is less dependent on domain, and more about the task that is needed to be accomplished.

What domain is this for, and what is your goal with these models?

I will note as well, fine-tuning may make this less necessary and cumbersome, because at that point you would already be training the model on what you want it to do, or how you want it to perform in a certain way. There are different pathways to achieve what you want :slightly_smiling_face:. If you could provide us some more details in what you’re trying to accomplish specifically, we are more than happy to help!

The easiest way to understand what we are building and why is in the following analogy: “when I say ‘Stock Valuation’ people generally think product-market fit, long-term competitive advantages, management expertise, etc-- The reasons why you would own that stock over any other stock.” But when I say “Home valuation” people just think “price”- which is price estimation not home valuation.

All LLM ai have been trained on conventional home valuation as price estimation and so this is the reason we are fine-tuning.

1 Like

Hi Dave - not sure yet whether I will be able to help but can you help me understand what the user input and model output is in your scenario? I’m not quite clear yet on that and I am trying to understand how you are looking to use finetuning for your use case.


We give people another way to evaluate a home purchase investments- relative to how the home attributes are likely to perform thru time. In order to learn how to use the valuation system and software, there is a learning process where you interact with our ai and essentially learn our system and to identify opportunities in the housing markets you are considering.

My question about system instructions is more theoretical. Are people using complex system instructions and getting good results in fine tuning?

Hi @HomeRank - Thanks for clarifying. Very generally speaking, fine-tuning can be extremely powerful to achieve the desired results, provided you have the right training data to go along with it.

It’s just critical to emphasize that fine-tuning, in the specific context within which the word is used in relation to fine-tuning OpenAI models, is mostly geared towards steering model behavior as opposed to injecting knowledge. That’s why I was curious to understand your use case better.

Generally though yes, you can definitely try fine-tuning with a more advanced system instruction. As with everything, a bit of initial trial and error and testing it with a smaller data set will help you validate whether it works or not.

I think the main limitation may be that currently fine-tuning is largely limited to GPT-3.5-turbo models which are not as powerful in their reasoning capabilities relative to GPT-4. So it is somewhat down to the complexity of the task at hand, whether a GPT 3.5 model can handle it.


I think there might be a miscommunication in the steps here. These are not woven together in the way you might be thinking.

As mentioned above, fine-tuning is better for complex model behavior, especially when instruction prompting in unsatisfactory and all attempts to improve it with prompting fails. Fine-tuning is also a very helpful tool in creating an expected format of interaction between the AI and the user.

What you are doing by fine-tuning is essentially showing the model “This is how you would interact with users, and this is what you should expect from them”.

Complex instruction prompting would be both unnecessary and more difficult depending on how a model is fine-tuned, because that is the intended purpose of fine-tuning a model: to enhance its ability to handle complex or niche instructions without needing complex prompting. It would “lock-in” so to speak the desired behavior.

1 Like

I’ve been working with various gpt models via the API. I’ve had some success “steering” the response of the LLMs by providing pattern information in a document for a knowledge retrieval Assistant. (I’m working on an app to help students struggling with algebra. LLMs make a lot of algebra mistakes.)

When you provide contextual information, either via prompt construction or knowledge retrieval, it impacts the LLMs algorithmic behavior. Google Bard explained it to me when I corrected an incorrect response from it. When I provided correct relevant information, it said that impacted its relevance filtering, uncertainty estimation and it’s exploration and retrieval process.

In a similar discussion with ChatGPT 4 it told me “Your input influenced the model’s response generation, skewing it towards what it then calculated as the most appropriate response based on the new information.”

1 Like

hey @HomeRank did you happen to find someone to help with this? I’m on the hunt for someone who can help with some training/fine tuning!