How to implement GPT4 API with internet access?

As the question said what are some easy and economical ways to implement Internet access in GPT4 API to get the latest information about world events?

Would prefer a node.js solution if possible.

PS: Internet access is not yet provided via API. Web ChatGPT version does this by crawling Bing.

You can use Google queries or other search engines to receive a list of sites and their descriptions in response. You can also ask GPT to make a more precise request in advance

First is to find a search API, unless you have a specific site you want crawled and indexed.

There’s a github repo for a duckduckgo python method that doesn’t cost - as long as it still works. Bing APIs, APIs that pirate Google…

So then you have function that can get URLs and search summaries.

Browsing arbitrary web pages is harder. You’re going to get tons of “Javascript required” if you just wget.

Beautiful Soup, Selenium - browse directly with more js
puppeteer (Pyppeteer), Playwright - browser controllers

To ensure that I am understanding this right, are you suggesting that we install a 3rd party search API to return the top urls and write a script to scrape those pages? Should we then use embedding to perform a search on that scraped content?

sorry if it’s too obvious and I am not getting it.

You can’t embed the whole web unless you are Google.

You don’t have to “install” a third party search API, you call the network search API with your AI function-handling code. Some shims that are already written can be time-saving though.

You could set up function call.

This is how it might look:

  • send user message to the API that may contain request to pull content from the internet
  • api responds with the function call to your “internet access” function
  • your app performs the “internet access” function and return the results to API as system message for further processing

One such example will be to use google search API:

  • user message shows intent to use search
  • openai api ask to search with the specific keyword
  • your software searches with the keyword and returns the result
  • chatgpt process the search result and return user with the reply from there

Thanks, folks. Now my next worry is how to save tokens. If I start parsing web pages then it would consume lots of tokens. Is there a way to fetch information by saving as many tokens as possible?

PS: I use Node.js so prefer a solution that works with that.

Depending on your use case, if you are talking about reading any webpage, you might want to use something like JSDom and return only the text content instead of the entire html document. Other solution like Cheerio and Puppeteer also can work.


Thanks, any clever tips to further reduce token consumption like offloading the webpage text summarisation to some other cheaper LLM service?

You could tinker with the tiktoken Python package and lossless compression. See (link at the bottom of the page)

1 Like