Best approach to extract structured data from webpage without hallucinations?

apris · June 5, 2024, 3:47am

I need to use API (whatever ChatCompletions or Assistants or some else) to extract structured data from Amazon product page. First I tried to extract just by link, but it returns some bullshit, for example https://www.amazon.com/-/de/dp/B0BZ8X9HGT/ :

It works better if I provide screenshot

Also it should work good, if I send the webpage source code. So what would be the most cost effective way to do this?

Diet · June 5, 2024, 3:59am

Hi!

So the first thing to note is that the model can’t actually browse the web. You need to implement some browsing functionality and include it as a tool for your assistant.

Extracting text from images isn’t super reliable, but it’s getting better. You’ll probably have more success with OCR tools if you need a high degree of reliability.

probably using a scraper? depends on what exactly you’re after.

MrFriday · June 5, 2024, 5:13am

Use Python Library BeautifulSoup4 for scrapping. There are many paid scrappers that can help you out if you are not good with code.

Topic		Replies	Views
How can I scrape websites and extract data to create structured entities? Prompting	3	6119	December 16, 2023
How to deal with unstructured data scraping for a website using AI? API vector-db	1	4051	July 17, 2024
Best way to browse the web with the API API assistants-api	8	5639	July 21, 2025
Recreate Web Browser GPT app in API API gpt-4 , chatgpt , gpt4o	1	620	June 15, 2024
Assistant that retrieves informations from a website API plugin-development , assistants , tools	5	7070	January 2, 2024

Best approach to extract structured data from webpage without hallucinations?

Related topics