Turn any website into an API with GPT-4

mangotree · April 7, 2023, 2:29pm

I got frustrated with the time and effort required to code and maintain custom web scrapers, so I built a more generic LLM-based solution for data extraction from websites (and potentially other sources). AI should automate tedious and un-creative work, and web scraping definitely fits this description.

One of the killer use cases of large language models like GPT is reformatting information from any format X to any other format Y, so I leveraged that to generate web scrapers and data processing steps on the fly. The big advantage over traditional scraping is that it’s adaptable to website changes and basically maintenance free.

Check it out at Kadoa.com and let me know what you think!

Here are some examples:

max557186 · April 7, 2023, 6:11pm

So you have access my api now? open ai said keep it secret

it gives me error no leak i see

mangotree · April 7, 2023, 6:29pm

Thanks for the feedback. Just changed the example and added a robot.txt scan.

mangotree · April 7, 2023, 6:32pm

As you can see in the network tab we never send your key to any other endpoint than OpenAI. Your example didn’t work because the description wasn’t specific enough. What fields are you trying to extract and from which specific site?

RonaldGRuckus · April 7, 2023, 6:44pm

That was quick!

Good luck in your endeavor.
Looking forward to seeing the progress

mangotree · April 8, 2023, 5:13am

Thanks! Let me know if you have any additional feedback

info21 · April 11, 2023, 4:21pm

Can you also handle sites with pagination, infinite scroll, and search filters? E.g., if I would like to extract many hundreds of records?

mangotree · April 12, 2023, 11:23am

Yes, the service has simple RPA capabilities like click automation and scrolling. This is not part of the public demo yet though.

mangotree · April 24, 2023, 11:35am

Update: we removed the need for an OpenAI key, so you can now try it out for free

rodrigueztboy83 · May 2, 2023, 7:14pm

Tried it out. Honestly, I found it to be very slow and not any better than any of the other billion commercial off-the-shelf scrapers out there.

mangotree · May 4, 2023, 6:48pm

The initial scraper generation that is showcased on the playground is indeed quite slow. The cool thing is that the data extraction is basically fully autonomous after the first configuration and automatically adapts to any website changes. Current solutions require constant maintenance.

mangotree · May 26, 2023, 7:13pm

Update: We’re now detecting all entities and their properties on a website, so you can conveniently select the data you want to extract from any website. We’ve also shipped some major performance improvements.

Topic		Replies	Views
I built an LLM-powered tool that can comprehend any website structure and extract the desired data in the preferred format Community	5	9713	November 4, 2023
GPT for scraping (extracting from) unstructured web pages API	0	2024	December 18, 2023
I made a chatbot that creates single-page websites (or microsites) Community gpt-4 , api	0	910	January 15, 2024
Unstable output from GPT: Refuses to regenerate previous success API	3	720	December 14, 2023
Showcase: using OpenAI API with scrapping Community open-source	1	1018	May 3, 2023

Turn any website into an API with GPT-4

Related topics