Hi there, any tips on automating gpt4o to prompt and find data on the internet using the API? I’m getting inconsistent results in the playground, yet chat GPT provides consistent accurate results
Thanks
Hi there, any tips on automating gpt4o to prompt and find data on the internet using the API? I’m getting inconsistent results in the playground, yet chat GPT provides consistent accurate results
Thanks
Hi and welcome to the developer community,
the api does not support webbrowsing.
You need to implement that by yourself.
You should write an ETL pipeline that can receive and work with multiple file formats and have a data model that can write relations and because of heavily used javascript you should use headless chrome or selenium to get the content in a combination with screenshots and ocr and spatial recognition…
Hope that helps.
Thanks for the swift reply Jochen. Do you know of any tool that would
You have several strategies at your disposal, and the right one will depend on your specific use case.
You can use a commercial search engine API (like Google Search API). This approach is suitable for generic searches where you’re looking for broad terms (e.g., ‘weather forecast data’). Keep in mind that these types of APIs will return a list of URLs, webpage descriptions, and matching scores. However, the webpage descriptions will be short summaries of the entire webpage. If you’re looking for specific information (e.g., contact details), the assistant may not be able to determine whether the webpage contains the sought information.
Alternatively, you can modify this approach by scraping the content of the webpages from the search results. There are “smart” loaders that can recognize and filter out unwanted content (e.g., ads) from the page, returning only pure text (such as with UnstructuredHTMLLoader). You can then pass this information to the agent to find the desired data.
If you consistently search through a fixed number of webpages, you could scrape them and store the information in a vector store for future queries.
Additionally, as @jochenschultz mentioned, if your goal is to search for a specific data source (such as a data table) and use it, you’ll need to manage processes like ETL (Extract, Transform, Load), pagination, and other complexities.
Hey Marko, thanks for the feedback! Hmm, would google search API be able to scrape data inside a website? For example, we have a food products barcode and name for our case. Our goal is to find out if there is a way we can prompt to search using the barcode and products name to find the missing piece of data, which is the ingredient list, without providing specific URLS to search through.
Hey Tadaskrasaitis0168, I believe that Google Search API (Custom Search JSON API | Programmable Search Engine | Google for Developers) will be able to find the correct webpage under the website with the ingredient list. From there, you can scrape the webpage with UnstructuredHTMLLoader. Just configure the Google Search API to search ONLY the website.
Thanks Marko, do you know of any tutorials on how to steup Google search API and how it works?
Try with this one: Custom Google Search API. If you want to get results of Google… | by Joey S | Medium
Or follow the formal documentation: Custom Search JSON API | Programmable Search Engine | Google for Developers
Hey Marko,
So, I am designing a search engine for a specific niche however, I need help to be able to pull the data so that people can view certain medical professionals on my site. Is there an API that can help with this? I know of NPI but their API is too vague. Any advice or help will be extremely appreciated. Thanks!
Another way is to construct your data from a web scraper and then feed that data to the API model.
Hey I am not sure that I could be of much help. It seems like the NPI API is proprietary API. If their API is vague then I would try implementing a wrapper around it… I don’t know, I would need to see what exactly are you trying to get from it.
If the information that you are trying to get is on the web, then you could try with web scraping.
Thanks Marko,
Would you be open for a short chat? I’m hoping you can steer me in the right direction.
Best regards,
E
Sure, no problem at all. Just bear in mind that I am sitting in CET time zone.