Fine-tune and Davinci

Hey guys,

I’m trying to create a copywriting tool but so far I can’t say I’ve been successful. I collected myself a dataset and my dataset contains 1.5 million words and 10 million characters. I fine-tune my own dataset with davinci and used my own fine-tune dataset as a model. Then when I wanted to observe the results, I saw that the results were very bad. And I need your help on this. Can anyone tell me how I can improve these results?

  1. Can I use davinci model together with the dataset I applied fine-tune?
def DavinciModel(prompt):
    response = openai.Completion.create(
      model1="text-davinci-002",
      #model2="davinci:ft-machine11-o-2022-08-20-00-05-06"
      #engine="davinci-instruct-beta-v3",
      prompt="Generate blog structure for following topic n\n {}".format(prompt1),
      temperature=0.7,
      max_tokens=256,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0
    )

As you can see in the code above, model2 is my fine-tune dataset, but instead of using it, I use davinci. But I want to use both together. Is this possible? Or better advice?

I want to move on to another question. I use the following stages while fine-tune the data set I have. Is this a correct use? I just use these two lines. Is this a correct approach?

  1. openai tools fine_tunes.prepare_data -f exampletext.txt
  2. openai api fine_tunes.create -t exampletext_prepared.jsonl -m davinci

exampletext.txt

With the scale of the cyber threat set to continue to rise, the International Data Corporation predicts that worldwide spending on cyber-security solutions will reach a massive $133.7 billion by 2022. Governments across the globe have responded to the rising cyber threat with guidance to help organizations implement effective cyber-security practices.

In the U.S., the National Institute of Standards and Technology (NIST) has created a cyber-security framework. To combat the proliferation of malicious code and aid in early detection, the framework recommends continuous, real-time monitoring of all electronic resources.

exampletext_prepared.jsonl

{“prompt”:“”,“completion”:“With the scale of the cyber threat set to continue to rise, the International Data Corporation predicts that worldwide spending on cyber-security solutions will reach a massive $133.7 billion by 2022. Governments across the globe have responded to the rising cyber threat with guidance to help organizations implement effective cyber-security practices.}
{“prompt”:”",“completion”:“In the U.S., the National Institute of Standards and Technology (NIST) has created a cyber-security framework. To combat the proliferation of malicious code and aid in early detection, the framework recommends continuous, real-time monitoring of all electronic resources.”}

The file contents are like this, I don’t know if I’m doing something wrong here. I would really appreciate if you can help

And last question, can I use other text processing or nlp algorithms to improve the outputs?