This idea came from a comment on this forum from an OpenAI staff member.
Not in documentation, but credible source.
I’ll take a test this week and come back with my results, because I’ve seen this over and over, but without details.
Until then, I can tell you what tests I did with fine-tuning.
-
Using GPT-3 to generate prompts.
Method: for a given text (usually news articles), I asked GPT-3 to formulate 5 questions which have answer in the text, and also generate answers for these questions. Then I made 6 questions variations, resulting 30 prompts and completions pairs for a single article.
Test results: on the first test, it failed completely to provide information from the given text. After improving the stop sequences the results were better, but still unsatisfactory: I was able to obtain the exact completion as in the fine-tuning file, but only if the prompt was very close to the prompt from the fine-tuning file. After these tests I went to the conclusion that the fine-tuning is not going to achieve what I want. The embeddings work much better for the same use case. -
Same method as above, but with a larger dataset.
Method: I made a fine-tuning file from a 96 strophes poem. Each prompt was something like this “strophe # from the poem Xxx, writen by Yyy” and the completion was the corresponding strophe.
Test results: it failed completely to return any of the strophes. But what it did was to generate content in the same style as in the poem, which was an accidental discovery