GPT seems to make web crawlers more efficient. specifically, it can:
-
GPT can extract the necessary information by directly understanding the content of each webpage, rather than writing complex crawling rules.
-
GPT can connect to the internet to determine the accuracy of crawler results or supplement missing information.
So I have created an experimental GitHub project “CrawlGPT” based on langchain that can run basic automated crawlers based on GPT-3.5. I hope to get any suggestions and assistance.