Any tools out there to pull/scrape complete website data and feed it into GPT?

crawler · May 31, 2024, 7:25pm

I searched for a tool like this a year ago when my friend and I tried to build a website chatbot. It was the time before the GPT site chat boom. There were no solutions at that moment which could be integrated into the website and answer user’s questions based on content.
I tried to search for an API to extract content because I didn’t want to do it myself and instead wanted to focus on AI chat. Crawlbase was at the top of Google search by “web crawler API”, but they didn’t have an API to crawl websites to get the full content. So, after all, I found that such a crawler will take time and effort. We abandoned those chat bot, unfortunately, because we thought the quality of the answers was really poor (sad, could have now $20k MRR, haha).

But instead, I built a tool specifically for this use case—to get content for every page to use in the AI chatbot. Unlike classic scrapers, a web crawler is not focused on bypassing anti-scrap protection, assuming the website owner wants his site to be parsed. As a developer, I simply find this really fun and technically interesting. If you would like to try webcrawlerapi[.]com

If you want to build it yourself, this is not a trivial task. You have to get the page content using Selenium/Playwright for JS rendering, get all the links, filter them, and again get content by all these links. Tonns of corner cases. Reach out if you need advice on how to do that.

Topic		Replies	Views
Turn any website into an API with GPT-4 Community gpt-4 , api	12	11667	December 22, 2023
GPT for scraping (extracting from) unstructured web pages API	0	2237	December 18, 2023
Unstable output from GPT: Refuses to regenerate previous success API	3	803	December 14, 2023
How is ChatGPT able to extract webpages so quickly? API chatgpt	3	3478	July 9, 2024
Create an IA which will crawl the pages and talk about it Community chatgpt	9	2692	January 29, 2024

Any tools out there to pull/scrape complete website data and feed it into GPT?

Related topics