Writing an article from combination of topic/subtopics from large corpus of data

Topic: How is inflation impacting middle class people.
Subtopic 1: Relation between interest rates and inflation.
Subtopic 2: Investment plans to counter inflation in future.

Consider I have above topic and subtopics. Also I have good amount of data to related to above topics, from which I want my content to be generated.
I want to generate an article for it using GPT3 text generation. I was thinking on lines that we can use embeddings and vector search as suggested by @daveshapautomator in his multidoc answering video, but I rather want article creation more than a QnA.
I am not leaning to use Fine tuning feature since it changes the factuality of input content at times.

@jhsmith12345 @IvanPsy


I would take a phased approach where you brainstorm and expand on the base topic. You can check out my AutoMuse content where I explore this for fiction. Nonfiction is probably going to be easier because it’s not as long. I maybe will do a video on this. :thinking:

1 Like

You might start with the multi-document search, and alter the prompts. I am currently doing something similar in order to generate full length essays.


Initial stab at it. If you specify everything you want in request.txt it should work.


Thanks a lot @daveshapautomator, since we want article/blog to be written from large input corpus, we want pt# 4 and 5 (brainstorming facts and writing actual sections) to be taken from corpus of data rather than pretrained GPT3 to make content factually correct.

  1. Take a prompt of some sort (natural language instructions) like “I want a blog about X…”
  2. Brainstorm the structure of the request (list a bunch of sections)
  3. Iterate on that list to improve it (is this a good list?)
    > 4. Brainstorm some facts or points to include for each section (repeat this like 2 or 3 times)
    > 5. Write the actual sections
  4. Iterate/improve the sections
  5. Clean up the final product

Can we do embedding conversion of corpus and then use text-similarity feature to get most related content (using cosine similarity) from corpus?(referred multi doc answering video)