Using GPT for automated crawling

GPT seems to make web crawlers more efficient. specifically, it can:

  1. GPT can extract the necessary information by directly understanding the content of each webpage, rather than writing complex crawling rules.

  2. GPT can connect to the internet to determine the accuracy of crawler results or supplement missing information.

So I have created an experimental GitHub project “CrawlGPT” based on langchain that can run basic automated crawlers based on GPT-3.5. I hope to get any suggestions and assistance.


Your project looks really cool, good job. I see it is written in Python, so you may want to consider adding a Python tag to the thread.


Also allow me to link it for you: