How can I Write a good article based on scrape content?

CrazyFlame · October 27, 2023, 5:59am

Hi, I scraped content on many websites. How can use these data to write a good article? Is there any framework to do it?

If I just feed all data to OpenAi and use a prompt, it can be out of tokens. I tried summarize the content of each website before feeding them to OpenAi but this cause lost some usefull data. Any better approach?

Thanks.

supershaneski · October 27, 2023, 6:42am

Use Embeddings API for each scraped content and store the resulting vector data somewhere. Then using RAG and specialized system prompts, it can generate you with whatever article based on the scraped data.

To test this idea, I copied the wikipedia articles about mango and coconut and got their embeddings and stored it. Note that I also have other files in store. Then I asked the AI:

user: write a brief paragraph about the mango fruit and coconut and their significance in asian culture?

assistant: The mango fruit and coconut hold significant cultural importance in Asian cultures. The mango, originating from Myanmar, Bangladesh, and India, has been cultivated in South and Southeast Asia for centuries. It is revered for its delicious taste and is used in various culinary dishes, desserts, and beverages. The coconut, on the other hand, has religious significance in South Asian cultures, particularly in Hindu rituals. It is also used in wedding and worship ceremonies. Additionally, coconuts played a crucial role in the long sea voyages of Austronesians, providing them with food, water, and building materials for their boats. These fruits have become integral parts of Asian culture, both in terms of their practical uses and symbolic meanings. (Sources: “mango.txt” and “coconut.txt”)

_j · October 27, 2023, 7:39am

It depends on if you know the articles by name you wish to reference, I suppose, and the quality of your database.

Suppose you have lots of OpenAI papers and web scrapings, and then you put some functions in to search and retrieve.

You then could ask the AI a simple question like this:

"From your knowledgebase, get and use the OpenAI blog about WebGPT and the OpenAI blog about function calling models to synthesize a new article “How to use functions to browse your own database of knowledge”.

vb · October 27, 2023, 10:40am

Good suggestions and hints already in the thread!
If there is a database with articles then what you really have is a unsorted pile of data.
And if there is the goal “to write articles” then you have a unclear expectations regarding the output.
Looking at the task from this perspective you now have two distinct entry points to arrive at a good solution:

structure your input data by summarizing, labeling, vectorizing, adding meta data etc…
creating a few sample outputs that match your desired level of quality and define what exactly is needed to arrive at this quality.

Then you can take the more hands-on approaches like the ones suggested above and work towards a process that enables the LLM to produce the desired outputs.

N2U · October 27, 2023, 12:10pm

Agreed!

I’ll add my own tip to this:

Tip: you can improve the summarization a lot by telling GPT what the summary is for, ie “summarize the following for a blogpost about x”

Topic		Replies	Views
What's your approach to topic research from multiple articles using GPT-3? API	0	677	October 31, 2022
Writing an article from combination of topic/subtopics from large corpus of data API	7	1251	March 12, 2024
Fine tuning gpt-3 to write in the writing style of a news outlet Prompting	7	3405	September 25, 2023
How can I create a long blog post with the api? API gpt-4	8	2403	February 12, 2024
Using ChatGPT for blog content can indeed be a valuable tool for content creators Prompting gpt-4 , chatgpt , openapi , article , writing	3	2091	July 22, 2023

How can I Write a good article based on scrape content?

Related topics