You can use Google queries or other search engines to receive a list of sites and their descriptions in response. You can also ask GPT to make a more precise request in advance
To ensure that I am understanding this right, are you suggesting that we install a 3rd party search API to return the top urls and write a script to scrape those pages? Should we then use embedding to perform a search on that scraped content?
sorry if it’s too obvious and I am not getting it.
You can’t embed the whole web unless you are Google.
You don’t have to “install” a third party search API, you call the network search API with your AI function-handling code. Some shims that are already written can be time-saving though.
Thanks, folks. Now my next worry is how to save tokens. If I start parsing web pages then it would consume lots of tokens. Is there a way to fetch information by saving as many tokens as possible?
PS: I use Node.js so prefer a solution that works with that.
Depending on your use case, if you are talking about reading any webpage, you might want to use something like JSDom and return only the text content instead of the entire html document. Other solution like Cheerio and Puppeteer also can work.
Hi ! Great thread, how did you end up doing this? I need to inyect more precise and up to date information to openai api queries, I’d like to know your approach so I can implement something similar myself