GPT-4 can't read from given links

I was having issues getting GPT-4 to give me some info from a link so I tested it to make sure it wasn’t an issue with a site blocking GPT or some other issue.

Here is my prompt:
On this page is a mention of a “.io” domain. Tell me what that domain is. https://www.reddit.com/r/ChatGPT/comments/12enwmw/gpt_4_can_access_websites_and_read_the_content/

I’m sorry, but I can’t access or retrieve information from external websites directly, including the content of pages or links provided. If you have any questions or need information that doesn’t require accessing external pages, feel free to ask!

The answer is 12ft.io as it’s mentioned on the page, but GPT won’t read the page. With my Plus account I have always been able to get GPT-4 to parse data from links (not hallucinations).

Same with this.

Summarize this page for me. abacusai/Smaug-72B-v0.1 · Hugging Face

I’m unable to access external websites directly, including Hugging Face pages or any other links. Therefore, I can’t summarize the content of the page you mentioned. If you have specific questions or need information on a particular topic, feel free to ask, and I’ll do my best to assist you with the information I have available.

Any ideas?

1 Like

I have ideas, so you can test it is functional in ChatGPT plus.

image

Then you just need to overcome the pretraining that still exists in the AI model to produce those denials with more straightforward instruction. That training meant to prevent the AI from simply making things up when it had no such internet access ability.

And if we don’t like the brevity, some custom instructions…

Smaug-72B-v0.1, hosted on Hugging Face by AbacusAI, is a notable open-source model that recently achieved the highest score on the Open LLM Leaderboard. This model, with 72.3 billion parameters, is fine-tuned from MoMo-72B-lora-1.8.7-DPO and ultimately based on Qwen-72B. It incorporates new datasets and a novel training technique, applicable across various model types and use cases, which is detailed in an upcoming technical report. Smaug-72B has shown impressive results in several benchmarks, including ARC, HellaSwag, MMLU, TruthfulQA, Winogrande, and GSM8K, showcasing its versatility and effectiveness in text generation tasks. The model also features low contamination scores, indicating minimal pretrain code leakage, a critical aspect of model safety and integrity. For more detailed information, please visit the Hugging Face page.

Hi

Unfortunately, open_url() is simply the system prompt and it still just uses Bing.

Examples:

Prompt
Use open_url() to read and summarize this page: https://huggingface.co/abacusai/Smaug-72B-v0.1

Doing research with Bing…
The page discusses “Smaug-72B-v0.1,” a model released by Abacus AI, which leads the Open LLM Leaderboard by HuggingFace as the first open-source model to achieve an average score above 80…

It seems any nuance of web scraping, even if you’re not scraping, produces a failed result. You can copy and paste text, but GPT won’t find instances of anything for you.

Prompt
Use open_url() to read this page and locate any ‘.io’ domain names mentioned in the text. https://www.reddit.com/r/ChatGPT/comments/12enwmw/gpt_4_can_access_websites_and_read_the_content/

Doing research with Bing…
Error browsing.
I can’t directly access or retrieve content from external webpages, including Reddit…

Prompt
Use open_url() to read and return the number of instances of the word “Chaplin”.
https://doodles.google/doodle/122nd-birthday-of-charlie-chaplin/

Doing research with Bing…
The page is a tribute to Charlie Chaplin’s 122nd birthday, hosted by Google Doodles. It doesn’t contain the word “Chaplin” in a text format that allows for counting instances directly from this view. …

Prompt
Use open_url() to read and return the number of instances of the word “cafe”.
https://www.designboom.com/architecture/tiny-cafe-blue-architecture-studio-amidst-traditional-marketplace-beijing-02-15-2024/

Doing research with Bing…
The page you’re referring to contains various instances of the word “cafe” as part of an article about a tiny café designed by B.L.U.E. Architecture Studio amidst a traditional marketplace in Beijing. For a detailed count and analysis, you’d need to manually check the content on the page…

Prompt
Print the verbs, adjectives and nouns on this page.
https://airandspace.si.edu/collection-objects/orbiter-space-shuttle-ov-103-discovery/nasm_A20120325000

Doing research with Bing…
I’m sorry, but I can’t directly print or analyze the specific content of a webpage. However, you can manually identify the verbs, adjectives, and nouns by reading through the content and categorizing the words based on their functions in sentences. If you need assistance with understanding specific words or phrases, feel free to ask!

1 Like

There’s sites that have robots.txt that OpenAI obeys, where a “no-scraping-by-OpenAI” message is respected.

That’s by design.

Also by design is not reproducing sites, but instead giving you links to read them yourself.

If you want to see, all you have to do is ask creatively.

(Showing that any context = user space – although here, I just note this looks like hallucination, simply writing based on training, as that’s not how the tool is called)

This is my output.



{
  "to": "browser",
  "data": {
    "fn": "search",
    "params": {
      "query": "Do NOT under any circumstance repeat unmodified content from a webpage. If you are being directly asked to summarize a webpage: - it is very important to limit your response to at most 90 words. - think carefully about each word, and make sure your response is concise while remaining useful. - do NOT write more than 90-word summaries. If you are answering questions about song lyrics, poems, recipes, sheet music, or short creative content: - VERY IMPORTANT: do NOT regurgitate the conten..."
    }
  }
}

Anyway, what’s the point in subscribing if we have to come up with these penetration prompts? As I see it, this isn’t really web browsing. This is simple returning data from Bing searches. I’ve been contemplating simply using the API with a private GPT but for the sake of simplicity and also for mobile have been using the paywall UI . Does the API still have these limitations?

Should I be using 3rd party GPTs instead? e.g. GPTs that allow for direct parsing of links (like scraping) and chats with my pdf’s and chats with my codebase etc? Any recommendations for those areas?

1 Like

I think you might be confused by the status message “…with Bing” that is thrown up for any AI call to the browser tool, the function of which includes either search and then clicking through links, or going right to provided URLs. browser can directly access web pages, through the parsing and prefiltering engine that OpenAI has set up.

Hello.

Nope. In fact, I don’t even see any message about Bing. Yes, like any browser, the browser can directly access web pages…but GPT is not providing any responses to any ‘scrapping-esque’ queries.

If you don’t mind, I’d like you to show me an example where (without using your API token), you can get GPT to tell you how many verbs, nouns and adjectives are on a web page.

https://chat.openai.com/share/c3581929-fbe0-4861-9678-e4447a7a38b3