Hey everyone!
I have a long document generation use case and I would appreciate it if you could share some ideas on how to go about this problem.
In this use case I’m working on, the main goal is to create requests for proposals (RFPs) based on a template file and several human-written RFPs. All the files are either PDFs or Word documents – I think I can convert the Word files to PDF so that everything is a PDF, though.
My idea is to somehow chunk the files and store them on a vector DB. This way, every time a new RFP has to be made, I can retrieve relevant RFP chunks that can help write each section – I think doing the document building section by section might be best, otherwise the LLM would have to both receive and output a lot of text. I also have to ensure the template is followed, but some older RFPs do not have the template’s structure.
I haven’t found much literature regarding long-document generation; the most relevant one was this one LLM Based Multi-Agent Generation of Semi-structured Documents from Semantic Templates in the Public Administration Domain.
If you have any papers, blogs posts, your own experience, etc. related with this kind of use case, I would enjoy you sharing it By the way, I’m probably going to use GPT-4o for this, but I think I can use any other OpenAI model.
Thanks in advance!