Any tips to using the API and the internet?

tadaskrasaitis0168 · September 11, 2024, 9:06am

Hi there, any tips on automating gpt4o to prompt and find data on the internet using the API? I’m getting inconsistent results in the playground, yet chat GPT provides consistent accurate results

Thanks

jochenschultz · September 11, 2024, 9:12am

Hi and welcome to the developer community,

the api does not support webbrowsing.

You need to implement that by yourself.

You should write an ETL pipeline that can receive and work with multiple file formats and have a data model that can write relations and because of heavily used javascript you should use headless chrome or selenium to get the content in a combination with screenshots and ocr and spatial recognition…

Hope that helps.

tadaskrasaitis0168 · September 11, 2024, 9:20am

Thanks for the swift reply Jochen. Do you know of any tool that would

Take the prompt with data sets that need to be found
For GPT to search the web to find the missing data set, without providing specific datas
Paste the response

MARK0 · September 11, 2024, 9:41am

You have several strategies at your disposal, and the right one will depend on your specific use case.

You can use a commercial search engine API (like Google Search API). This approach is suitable for generic searches where you’re looking for broad terms (e.g., ‘weather forecast data’). Keep in mind that these types of APIs will return a list of URLs, webpage descriptions, and matching scores. However, the webpage descriptions will be short summaries of the entire webpage. If you’re looking for specific information (e.g., contact details), the assistant may not be able to determine whether the webpage contains the sought information.

Alternatively, you can modify this approach by scraping the content of the webpages from the search results. There are “smart” loaders that can recognize and filter out unwanted content (e.g., ads) from the page, returning only pure text (such as with UnstructuredHTMLLoader). You can then pass this information to the agent to find the desired data.

If you consistently search through a fixed number of webpages, you could scrape them and store the information in a vector store for future queries.

Additionally, as @jochenschultz mentioned, if your goal is to search for a specific data source (such as a data table) and use it, you’ll need to manage processes like ETL (Extract, Transform, Load), pagination, and other complexities.

tadaskrasaitis0168 · September 12, 2024, 9:47am

Hey Marko, thanks for the feedback! Hmm, would google search API be able to scrape data inside a website? For example, we have a food products barcode and name for our case. Our goal is to find out if there is a way we can prompt to search using the barcode and products name to find the missing piece of data, which is the ingredient list, without providing specific URLS to search through.

MARK0 · September 12, 2024, 10:02am

Hey Tadaskrasaitis0168, I believe that Google Search API (Custom Search JSON API | Programmable Search Engine | Google for Developers) will be able to find the correct webpage under the website with the ingredient list. From there, you can scrape the webpage with UnstructuredHTMLLoader. Just configure the Google Search API to search ONLY the website.

tadaskrasaitis0168 · September 12, 2024, 10:17am

Thanks Marko, do you know of any tutorials on how to steup Google search API and how it works?

MARK0 · September 12, 2024, 10:27am

Try with this one: Custom Google Search API. If you want to get results of Google… | by Joey S | Medium

Or follow the formal documentation: Custom Search JSON API | Programmable Search Engine | Google for Developers

enmusic1090 · February 20, 2025, 6:09pm

Hey Marko,
So, I am designing a search engine for a specific niche however, I need help to be able to pull the data so that people can view certain medical professionals on my site. Is there an API that can help with this? I know of NPI but their API is too vague. Any advice or help will be extremely appreciated. Thanks!

dhanayat.harshat · February 20, 2025, 6:12pm

Another way is to construct your data from a web scraper and then feed that data to the API model.

MARK0 · February 20, 2025, 7:03pm

Hey I am not sure that I could be of much help. It seems like the NPI API is proprietary API. If their API is vague then I would try implementing a wrapper around it… I don’t know, I would need to see what exactly are you trying to get from it.
If the information that you are trying to get is on the web, then you could try with web scraping.

enmusic1090 · February 20, 2025, 8:32pm

Thanks Marko,

Would you be open for a short chat? I’m hoping you can steer me in the right direction.

Best regards,

E

MARK0 · February 21, 2025, 6:17am

Sure, no problem at all. Just bear in mind that I am sitting in CET time zone.

Topic		Replies	Views
How to use GPT API to check companies Domain of activity? API	2	2174	January 22, 2025
Assistance API Web Search Capability Inquiry API	16	1840	December 6, 2024
API for searching the latest information on the internet API gpt-4 , chatgpt , plugin-development , api , chatgpt-plugin	10	8698	February 20, 2025
How to chat with an API of an ERP? Prompting api , web-browsing , chat	5	1481	February 9, 2024
Connect via API to the web for APA references API chatgpt , gpt-4-turbo	2	1567	December 25, 2023

Any tips to using the API and the internet?

Related topics