Train (fine-tune) a model with text from books or articles

I think the 500 tokens comes from the observation that an idea is encapsulated in 1 to 3 paragraphs, and 500 tokens is up to 3 average paragraphs. The 1000 token completion just makes sense since it gives the model some room to breath and contain its output ideas to less than 5-10 ideas max. You probably don’t want the model to create too many more output ideas than input ideas, otherwise it can start to drift. At least that is my theory.

More discussion in this thread:

3 Likes