Reliably reading URL content

I’m working on a Chrome extension in which I pass the current tab’s URL in my prompt and then query about it.
I can see that the 4o model can read the URL content, while 4o-mini obviously can’t and hallucinates based on the URL itself.
The problem is that 4o also sometimes seems to have the same problem. How can I tailor my prompt to minimize the occurrence of this problem?

My prompt already includes text “Do not guess based solely on the name, URL, or external cues; rely strictly on the page contents.” This seems to help a bit, but it is not enough.

1 Like

Hi @deg and welcome to the community.

Currently it is not possible to do a web search or URL scrape using the API. So whatever results you have seen, they have been hallucinations or a fluke. Right now the only way to fetch/scrape contents from a URL using the API is if you perform your own web scraping using something like BeautifulSoup or an external service like Apify.

1 Like