I’m building an app such as the following: I share a topic I want to learn more about and I ask ChatGPT to share 3 links I can read to learn more.
I’m using the API currently (specifically with Assistants, but I’m flexible).
The API will respond with links, but over 50% are varied types of hallucinations: broken links, 404 pages, etc.
When I use the ChatGPT UI, I see that there’s a “Searched X sites” (X varies per completion), that shows referenced links, and those are great!
Is there a way to programmatically get the links which were referenced? Based on my look at the docs and checks in the OpenAI Discord, I believe the answer is no.
For anyone that’s also interested in possible resolutions to the core application need, here are some I’m aware of:
Other APIs: I might be better off using other APIs such as Google or Bing searches to find links.
Function Calling: I’ve considered that I could create a function to validate the “links” which are returned in the output.
Fine-Tuning or RAG: I considered these briefly, but I don’t believe either are relevant solutions. This doesn’t seem like a context problem. It’s a slight behavioral problem, but maybe too inherent to LLMs to be solvable without some extreme fine-tuning.
Alternative Out of the Box Solutions: Claude seems to have similar out-of-the-box limitations. However, the Perplexity LLM might be well-suited for this challenge out-of-the-box. Anecdotally, it worked well on quick tests.
This is the way to go. The API - unlike ChatGPT - does not come natively with the ability to search the web and the issue you are currently experiencing is the “expected behaviour”. A common approach to replicate the functionality using the API is to use function calling and then rely on search APIs such as the ones you mentioned to obtain actual links.