Hello OpenAI Community,
I have a question about fine-tuning. I have a large volume of book data that I would like to use for fine-tuning. However, I do not have any question-answer format data, and given the vast amount of content, I am unsure how to create such question-answer pairs from it.
My goal is to train the model on this book data and build a chatbot that can engage in conversations based on this information. I am looking for advice on the best approach to achieve this.
Thank you for your help.
Look into RAG (retrieval augmented generation). On this forum and elsewhere. Also look at embeddings.
You basically create a mini search engine on your book (via embeddings). And feed this to the prompt for the AI to respond to the user.
Fine tunes here are not going to soak in much knowledge. They can soak in tone, but lack specific content.
We’ve all been in your shoes … how do I convert a book into prompt/completion pairs?
For tone, their are various posts here on how to do it. But basically you create prompt/completion pairs on the book by converting each passage into a neutral passage. This is the “prompt’ leg. The completion is the original passage from the book. This fine-tune will capture the tone of your book, the writing style, etc.
3 Likes
Thank you very much for your help. I will continue to study and learn more about this topic.