Welcome to the OpenAI community.
IMO fine-tuning isn’t required for this use-case.
According to my understanding of your requirements, you are trying to get the summary and citation of relevant docs based on the user’s query. Here’s an outline of the process:
- Retrieve the relevant docs from your vector DB.
- Generate a prompt programmatically with the doc to be summarized. This can be done per doc, or for all the docs in one go, depending on your requirements and docs size. Use
gpt-3.5-turbo
to minimize token costs. - Generate the user reply programmatically by concatenating response(s) from the model along with the retrieved docs to be cited.