Background
I want to generate a long content for an ariticle, for which I have already generated an outline. And also I need to merge some internel materials into the content, here is the current workflow:
Use RAG for internel material
for each section of the outline, fetch the related content from RAG system
generate content based on #2 context
Problem
There are some duplciated content generated based on the same RAG result as each section is generated separately and it might have the same RAG result as the context.
Hi, interesting question. To me, the simplest approach would be to include the prior sections of the article that you’ve already drafted into the context, and let it know “Don’t repeat topics/information that have already been covered in the article so far.”
That’s a good suggestion - but I’m wondering if I have too many prior sections which might increase the context window to a very big number, not sure if chatgpt will handle that instruction “Don’t repeat topics/information” in a good manner.
Another possible solution is:
exclude the content from previous retrieve (will this make chatgpt lost some information?)
See my short clip here (https://youtu.be/6JeR-w_NJ84) on how to reduce context length. I can elaborate on how to be selective about what to expose at what point if applicable. So far as I can see, gpt-40-mini seems to be pretty good with 5 chapters at a time (with outline and chapter-outlines)
In general , by clearly distinguishing what you expect in each section (in my case below, chapter) makes a huge difference. In other words, try defining the template, outline and chapter-outlines first.
Then the fact that you have same RAG for different section is actually good; because it will aid in a smoother more consistent flow.
You can summarize each previously generated section and include it as part of the prompt to ensure that the new generation does not repeat existing content.
Another approach I would try is to create an agent that checks the newly generated sections against the previously generated ones. I would implement this using a vector store. Each accepted generation would be stored in the store. Then, the agent would compare each new generation against the stored content. If the agent identifies similarities between the new generation and earlier sections, it would notify the RAG with a message indicating which content is redundant. Based on this feedback, the RAG would retry its generation.
If you’re worried about the context length, an alternative would be to provide the article outline up to that point - that honestly might solve your problem as is. You could also add some complexity by including the name/description of the docs that were retrieved for each outlined point of your article. This would ground the LLM in what it means to “don’t repeat topics/information,” without having to spend a lot on the input.
I was originally thinking about the similar approach, then I reallized that the order of each outline item will have some differences, like for the very first outline item, RAG will search all the documents, for the second outline item, RAG will search all the documents exclude the first one.
I’m thinking if we could generate the outline for the RAG, then ask LLM to match the RAG outline to the article outline.