Reliably reading URL content

deg · December 19, 2024, 5:59pm

I’m working on a Chrome extension in which I pass the current tab’s URL in my prompt and then query about it.
I can see that the 4o model can read the URL content, while 4o-mini obviously can’t and hallucinates based on the URL itself.
The problem is that 4o also sometimes seems to have the same problem. How can I tailor my prompt to minimize the occurrence of this problem?

My prompt already includes text “Do not guess based solely on the name, URL, or external cues; rely strictly on the page contents.” This seems to help a bit, but it is not enough.

platypus · December 19, 2024, 6:15pm

Hi @deg and welcome to the community.

Currently it is not possible to do a web search or URL scrape using the API. So whatever results you have seen, they have been hallucinations or a fluke. Right now the only way to fetch/scrape contents from a URL using the API is if you perform your own web scraping using something like BeautifulSoup or an external service like Apify.

deg · January 6, 2025, 10:29am

Thanks.

It’s scarily impressive how often 4o was able to create plausible brief summaries of a page without seeing anything besides the URl – enough that it had me fooled for a while.

I guess that says something deep and philosophical about how much of what we see in this world is redundant.

Topic		Replies	Views
Reading and analyzing webpages content by GPT-4 API API gpt-4 , plugin-development , api , chatgpt-plugin	1	7921	January 22, 2024
Gpt-4o web browsing capability where? API api	18	18753	June 28, 2024
ChatGPT's API returns worse web search results than it's web UI and it can't explain to me why API chatgpt , api , web-browsing , web-search	3	1295	May 24, 2025
Batch processing not finding external URLs API api	1	223	June 1, 2024
Issues with consuming text from a webpage API api	3	73	January 10, 2025

Reliably reading URL content

Related topics