Hey guys,
I’m trying to create a copywriting tool but so far I can’t say I’ve been successful. I collected myself a dataset and my dataset contains 1.5 million words and 10 million characters. I fine-tune my own dataset with davinci and used my own fine-tune dataset as a model. Then when I wanted to observe the results, I saw that the results were very bad. And I need your help on this. Can anyone tell me how I can improve these results?
- Can I use davinci model together with the dataset I applied fine-tune?
def DavinciModel(prompt):
response = openai.Completion.create(
model1="text-davinci-002",
#model2="davinci:ft-machine11-o-2022-08-20-00-05-06"
#engine="davinci-instruct-beta-v3",
prompt="Generate blog structure for following topic n\n {}".format(prompt1),
temperature=0.7,
max_tokens=256,
top_p=1,
frequency_penalty=0,
presence_penalty=0
)
As you can see in the code above, model2 is my fine-tune dataset, but instead of using it, I use davinci. But I want to use both together. Is this possible? Or better advice?
I want to move on to another question. I use the following stages while fine-tune the data set I have. Is this a correct use? I just use these two lines. Is this a correct approach?
- openai tools fine_tunes.prepare_data -f exampletext.txt
- openai api fine_tunes.create -t exampletext_prepared.jsonl -m davinci
exampletext.txt
With the scale of the cyber threat set to continue to rise, the International Data Corporation predicts that worldwide spending on cyber-security solutions will reach a massive $133.7 billion by 2022. Governments across the globe have responded to the rising cyber threat with guidance to help organizations implement effective cyber-security practices.
In the U.S., the National Institute of Standards and Technology (NIST) has created a cyber-security framework. To combat the proliferation of malicious code and aid in early detection, the framework recommends continuous, real-time monitoring of all electronic resources.
exampletext_prepared.jsonl
{“prompt”:“”,“completion”:“With the scale of the cyber threat set to continue to rise, the International Data Corporation predicts that worldwide spending on cyber-security solutions will reach a massive $133.7 billion by 2022. Governments across the globe have responded to the rising cyber threat with guidance to help organizations implement effective cyber-security practices.}
{“prompt”:”",“completion”:“In the U.S., the National Institute of Standards and Technology (NIST) has created a cyber-security framework. To combat the proliferation of malicious code and aid in early detection, the framework recommends continuous, real-time monitoring of all electronic resources.”}
The file contents are like this, I don’t know if I’m doing something wrong here. I would really appreciate if you can help
And last question, can I use other text processing or nlp algorithms to improve the outputs?