Train (fine-tune) a model with text from books or articles

georgei · December 26, 2022, 4:02pm

This idea came from a comment on this forum from an OpenAI staff member.
Not in documentation, but credible source.

I’ll take a test this week and come back with my results, because I’ve seen this over and over, but without details.

Until then, I can tell you what tests I did with fine-tuning.

Using GPT-3 to generate prompts.
Method: for a given text (usually news articles), I asked GPT-3 to formulate 5 questions which have answer in the text, and also generate answers for these questions. Then I made 6 questions variations, resulting 30 prompts and completions pairs for a single article.
Test results: on the first test, it failed completely to provide information from the given text. After improving the stop sequences the results were better, but still unsatisfactory: I was able to obtain the exact completion as in the fine-tuning file, but only if the prompt was very close to the prompt from the fine-tuning file. After these tests I went to the conclusion that the fine-tuning is not going to achieve what I want. The embeddings work much better for the same use case.
Same method as above, but with a larger dataset.
Method: I made a fine-tuning file from a 96 strophes poem. Each prompt was something like this “strophe # from the poem Xxx, writen by Yyy” and the completion was the corresponding strophe.
Test results: it failed completely to return any of the strophes. But what it did was to generate content in the same style as in the poem, which was an accidental discovery

Topic		Replies	Views
How can I use Embeddings with Chat GPT 3-5 Turbo Prompting	39	47989	December 12, 2023
The length of the embedding contents API	48	33183	December 13, 2023
How to feed data for completions, instead of using prompt/answer fine-tuning format? API	25	17505	December 17, 2023
Do you fine tune? If so why? API	34	4494	December 25, 2023
Fine-tuning myths / OpenAI documentation API	24	14015	December 23, 2023