API Limitation: is there no way to retrieve the web searches that were used for a completion?

Context

  • I’m building an app such as the following: I share a topic I want to learn more about and I ask ChatGPT to share 3 links I can read to learn more.
  • I’m using the API currently (specifically with Assistants, but I’m flexible).
  • The API will respond with links, but over 50% are varied types of hallucinations: broken links, 404 pages, etc.
  • When I use the ChatGPT UI, I see that there’s a “Searched X sites” (X varies per completion), that shows referenced links, and those are great!

Is there a way to programmatically get the links which were referenced? Based on my look at the docs and checks in the OpenAI Discord, I believe the answer is no.

For anyone that’s also interested in possible resolutions to the core application need, here are some I’m aware of:

  • Other APIs: I might be better off using other APIs such as Google or Bing searches to find links.
  • Function Calling: I’ve considered that I could create a function to validate the “links” which are returned in the output.
  • Fine-Tuning or RAG: I considered these briefly, but I don’t believe either are relevant solutions. This doesn’t seem like a context problem. It’s a slight behavioral problem, but maybe too inherent to LLMs to be solvable without some extreme fine-tuning.
  • Alternative Out of the Box Solutions: Claude seems to have similar out-of-the-box limitations. However, the Perplexity LLM might be well-suited for this challenge out-of-the-box. Anecdotally, it worked well on quick tests.

This is the way to go. The API - unlike ChatGPT - does not come natively with the ability to search the web and the issue you are currently experiencing is the “expected behaviour”. A common approach to replicate the functionality using the API is to use function calling and then rely on search APIs such as the ones you mentioned to obtain actual links.

1 Like

Ah, key detail there that I missed before: “The API - unlike ChatGPT - does not come natively with the ability to search the web…”

I’ll check the docs to confirm that but was not aware before. I assumed the search was happening in the backend but not being exposed.

1 Like

No worries. This is not explicitly addressed in the docs however.

2 Likes

Woops - accidentally deleted my own response :rofl:

Thanks to @jr.2509 for the pointers. Those led me to follow up searches and I realized there’s existing conversation on this.

In addition to @jr.2509’s responses here and elsewhere on the forums, here’s one useful thread: How to implement GPT4 API with internet access? - #9 by raymondyeh

And another here: How to get GPT-4 API access with Internet