I have fine tuned davinci with a set of data (prompt: question ? / completion: answer) and I get a strange result.
When I use the fine tuned model using some prompts that exist in the dataset, I get a completely different result than the completion it is trained for.
Do you have an explication? (Temperature = 0 / Top P = 1)
After some tests… Things are getting a little bit better…
Here are some lessons (for now):
1/ Fine-tuning a model doesn’t prevent prompt design. The prompt has to be as efficient as possible, even with a fine-tuned model.
2/ Using low temperature, high Top P, High Frequency, and presence Penality seem a good option
3/ defining a stop sequence is also a good choice. For classical contents (extracts from books) I use a double “return”.
There are some subjects like Math (arguing with it about graph theory, no numbers or equations, but it doesn’t really understand the concepts very well) where it really fails.
I’m not certain, but I think fine-tuning whittles it down rather than teaches it new things, so if it doesn’t know something I don’t think you can teach it to it with fine-tuning.
Maybe you could get your books into the next round of GPT-3 training? I don’t know how much of Google Books repository is included in the AI’s training data.