Have you tried reducing the temperature at all? You could also consider reducing the pool of available answers by setting top_p to a lower value. (maybe 0.3)
I have spent a bit of time figuring out the difference between top_p and temperature for a course I am writing.
STAGE 1:
When the AI builds a sentence, it starts off by building list of words/answers that it thinks fits the situation. This will also include things that it had learned before your training (eg a rabbit lives in a rabbit hole or under a hill in online childrens books)
Each word/token has its own probability score of being correct.
Once it has built the list of words, it uses top_p to slice off the least probable answers.
But, because your score for top_p is 1, it keeps the entire list.
If you were to set top_p to 0.3 (or similar) it would drop the bottom 70% of tokens/words/answers (this would completely eliminate hundreds of “long tail” or low priority answers).
STAGE 2:
Once it has this list, then temperature kicks in.
A temperature of 0 leaves their chances of being picked untouched.
But a probablity score of 0.8 adds quite a lot of weight to the lower probability options. The unusual or low probability answers get a boost in their score. They suddenly become more likely to be picked.
(Out of interest, a value of 2 gives all tokens/words even chances of being picked)
So, in effect, your line where you give a temperature of 0.8 is telling the AI to consider the less likely answers (with considerable bias)
TRY THIS:
I would try top_p of 0.3 AND temperature of 0.3
I know the documentation doesn’t recommend this, but the setting do two different things.
In your case, top_p may be more important
Side Note : I have a hypothesis (untested) that the tokens for fine-tuning text is loaded with a higher probability score any way. I assume this because it would be the only way to override the massive weight of knowledge the AI already has from its existing learning.
If the AI had read hundreds of books where the rabbit lived in a hole, your single rule about living in the forest would have very little overall weight and would be considered to be one of the hundreds of outliers or long-tail options.
Of course I don’t know if this is true because I’m not privy to the inner working of GPT.
Edited: for spelling and the word “doesn’t” instead of “does”