Best approach to extract structured data from webpage without hallucinations?

I need to use API (whatever ChatCompletions or Assistants or some else) to extract structured data from Amazon product page. First I tried to extract just by link, but it returns some bullshit, for example https://www.amazon.com/-/de/dp/B0BZ8X9HGT/ :

It works better if I provide screenshot

Also it should work good, if I send the webpage source code. So what would be the most cost effective way to do this?

Hi!

So the first thing to note is that the model can’t actually browse the web. You need to implement some browsing functionality and include it as a tool for your assistant.

Extracting text from images isn’t super reliable, but it’s getting better. You’ll probably have more success with OCR tools if you need a high degree of reliability.

probably using a scraper? :thinking: depends on what exactly you’re after.

Use Python Library BeautifulSoup4 for scrapping. There are many paid scrappers that can help you out if you are not good with code.