I got frustrated with the time and effort required to code and maintain custom web scrapers, so I built a more generic LLM-based solution for data extraction from websites (and potentially other sources). AI should automate tedious and un-creative work, and web scraping definitely fits this description.
One of the killer use cases of large language models like GPT is reformatting information from any format X to any other format Y, so I leveraged that to generate web scrapers and data processing steps on the fly. The big advantage over traditional scraping is that it’s adaptable to website changes and basically maintenance free.
Check it out at Kadoa.com and let me know what you think!
As you can see in the network tab we never send your key to any other endpoint than OpenAI. Your example didn’t work because the description wasn’t specific enough. What fields are you trying to extract and from which specific site?
The initial scraper generation that is showcased on the playground is indeed quite slow. The cool thing is that the data extraction is basically fully autonomous after the first configuration and automatically adapts to any website changes. Current solutions require constant maintenance.
Update: We’re now detecting all entities and their properties on a website, so you can conveniently select the data you want to extract from any website. We’ve also shipped some major performance improvements.