Fine-tuning on documents - Unsure of general process

I have a series of specialized documents I’d like to fine-tune on. I want to pull the same information from each document, but it is embedded within the document itself.

I have the sections of each document via vector search, but I’m trying to figure out how to take those sections and add them to the training data prompts.

I’m starting each prompt with: “Given the following information: \n?” and then providing the completion.

Is this the right process for fine-tuning? For each document, I might have 30 questions. Should I just be creating a single training file with the 30 prompts and completions for each document?

1 Like