I appreciate your insights and I think you’ve highlighted a crucial aspect of the model’s learning: it’s focused on understanding the style of the book rather than the factual content or the sequential relations between the sentences. In other words, the model appears to interpret its task as generating responses that mimic the book’s style rather than ensuring factual correctness.
To address this, we could consider a two-step fine-tuning process that separates the tasks of content assimilation and behavioral or stylistic learning. First, we teach the model the content of the book. For this, we could create an isolated ‘knowledge silo’ where each page of the book is associated with a prompt. Something like:
{prompt: “oxwh page {page number}\n\n###\n\n”, completion: “{content of page 1} ENDEND”}
In this instance, ‘oxwh’ is a unique token that triggers and references the model’s learned knowledge. It should not be part of the original training data of the model and is meant to help to activate and recall the book’s content. This stage might intentionally involve overfitting, as we want the model to learn the content verbatim. It should only overfit to prompts including the ‘oxwh’ token, this way the knowledge is precise and isolated.
In the second step, we train the model to respect the facts within its responses. Here, we generate questions relating to entire pages rather than individual sentences and include references to the facts present on those pages. To automate this, we could leverage GPT-4’s ability to generate interesting questions and corresponding answers that explicitly reference the page where the fact was mentioned. A reference in the answer might be like:
“Explaining some fact in natural language. [oxwh page 5] Explaining a different fact [oxwh page 8]”.
By following this approach, we might be encouraging the model to make use of the overfitted content while also adhering to a specific answering style — not necessarily the style of the book, but rather the natural language responses generated by GPT-4.
This is just an idea, and I haven’t tested it yet. We would likely need to adjust the number of epochs and learning rate throughout the process. But this could potentially result in a model that is more factually accurate and still capable of generating coherent, relevant responses.