Hi there, any tips on automating gpt4o to prompt and find data on the internet using the API? I’m getting inconsistent results in the playground, yet chat GPT provides consistent accurate results
Thanks
Hi there, any tips on automating gpt4o to prompt and find data on the internet using the API? I’m getting inconsistent results in the playground, yet chat GPT provides consistent accurate results
Thanks
Hi and welcome to the developer community,
the api does not support webbrowsing.
You need to implement that by yourself.
You should write an ETL pipeline that can receive and work with multiple file formats and have a data model that can write relations and because of heavily used javascript you should use headless chrome or selenium to get the content in a combination with screenshots and ocr and spatial recognition…
Hope that helps.
Thanks for the swift reply Jochen. Do you know of any tool that would
You have several strategies at your disposal, and the right one will depend on your specific use case.
You can use a commercial search engine API (like Google Search API). This approach is suitable for generic searches where you’re looking for broad terms (e.g., ‘weather forecast data’). Keep in mind that these types of APIs will return a list of URLs, webpage descriptions, and matching scores. However, the webpage descriptions will be short summaries of the entire webpage. If you’re looking for specific information (e.g., contact details), the assistant may not be able to determine whether the webpage contains the sought information.
Alternatively, you can modify this approach by scraping the content of the webpages from the search results. There are “smart” loaders that can recognize and filter out unwanted content (e.g., ads) from the page, returning only pure text (such as with UnstructuredHTMLLoader). You can then pass this information to the agent to find the desired data.
If you consistently search through a fixed number of webpages, you could scrape them and store the information in a vector store for future queries.
Additionally, as @jochenschultz mentioned, if your goal is to search for a specific data source (such as a data table) and use it, you’ll need to manage processes like ETL (Extract, Transform, Load), pagination, and other complexities.
Hey Marko, thanks for the feedback! Hmm, would google search API be able to scrape data inside a website? For example, we have a food products barcode and name for our case. Our goal is to find out if there is a way we can prompt to search using the barcode and products name to find the missing piece of data, which is the ingredient list, without providing specific URLS to search through.
Hey Tadaskrasaitis0168, I believe that Google Search API (Custom Search JSON API | Programmable Search Engine | Google for Developers) will be able to find the correct webpage under the website with the ingredient list. From there, you can scrape the webpage with UnstructuredHTMLLoader. Just configure the Google Search API to search ONLY the website.
Thanks Marko, do you know of any tutorials on how to steup Google search API and how it works?
Try with this one: Custom Google Search API. If you want to get results of Google… | by Joey S | Medium
Or follow the formal documentation: Custom Search JSON API | Programmable Search Engine | Google for Developers