I want to train a model using fine tune, so that this model can reply to mails as per the polices and structure of my company

this type of jsonl data i am using to train model , i am using davinci as base model .
{“prompt”: “Q: What is the name of the company?”, “completion”: “A: The name of the company is ABC Doors and Windows.”}

{“prompt”: “Q: Who is the owner of the company?”, “completion”: “A: James Brook is the owner of the company.”}

{“prompt”: “Q: Where is the company located?”, “completion”: “A: The company is located in the USA.”}

{“prompt”: “Q: What services does the company provide?”, “completion”: “A: The company provides a wide variety of new windows and doors for commercial and personal spaces.”}

{“prompt”: “Q: What is the return policy of the company?”, “completion”: “A: The company offers a 15-day complete replacement policy if any manufacturing errors are detected.”}

{“prompt”: “Q: Does the company provide repair and service?”, “completion”: “A: Yes, the company provides repair service within 24 hours upon request.”}

after fine tuning is complete i am using this new model in play ground and giving same prompt as above like what is the mane of company but answer is not correct

Welcome to the OpenAI community @akshayjaggi146

If the goal is factual responses, use embeddings instead of fine-tunes.

PS: Your prompt completion pairs for fine-tune training dataset aren’t properly formatted per the recommended formatting guidelines.

1 Like

if answer is not satisfactry , u can try hypaermeter tuning , iterate with diiferent value of hyperparameter
also make sure u have send atlaeat 500 question -pair dataset
and for the fomatting use

openai tools fine_tunes.prepare_data -f <LOCAL_FILE>

it will automatic format ur data

1 Like

Fine tune is computational expensive and headache. I always use semantic search with company pdf. Add some instructions to that including examples how to answer. It will do more than fine tuning. Just use embeddings.

1 Like

response coming for my prompt is repetitive. Is this because i am using few line of jsonl

yes it is because of bad fin tune model
u yan give more data for training and also u can iterate with different value of hyperparameter to see which value is right for ur model

That problem is discussed here:

1 Like

thank you @sps , now i have to add stop sequence at the end of completions . can i edit my current jsonl file and update it in same model, or i have to make new model with updated file

You’ll have to properly format the training data according to the guidelines mentioned in the docs and train a base model.


1 Like

@sps i am using langchain package for embadding , can you please suggest me is this right approach ?

Langchain describes itself as a framework for developing applications powered by language models. I haven’t used it.

But, if it works for you, it’s great.