Configure custom GPT to access and parse website source code

orlov · January 23, 2024, 10:46am

I’m Plus user and configure custom GPT for analyzing of some aspects of website source code.

In the GPT configuration is Web Browsing activated. However, I go crazy with further GPT configuration:

On one day I prompt GPT in the left configuration frame Configure yourself for accessing and parsing SSL-secured website source code with Beautiful Soap, Requests and urllib3. After this I prompt GPT in the right frame Visit [url] and output the second H4 heading - and get the data as expected.

On the next day the same prompt in the right frame causes the execution of this code

import requests
from bs4 import BeautifulSoup

# URL to be parsed
url = "http://aSite2Visit.de/index.php/kfz-service"

try:
    # Sending a request to the URL
    response = requests.get(url)

    # Checking if the request was successful
    if response.status_code == 200:
        # Parsing the content of the webpage
        soup = BeautifulSoup(response.content, 'html.parser')

        # Finding all H4 headings
        h4_headings = soup.find_all('h4')
        h4_headings_text = [heading.get_text().strip() for heading in h4_headings]
    else:
        h4_headings_text = ["Failed to retrieve the webpage. Status code: " + str(response.status_code)]
except Exception as e:
    h4_headings_text = [str(e)]

h4_headings_text

which ends up with the error like

["HTTPConnectionPool(host='aSite2Visit.de, port=80): Max retries exceeded with url: /index.php/kfz-service (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ea7810893a0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))"]

If I prompt GPT to visit an SSL-secured website the executed code looks like

import requests
from bs4 import BeautifulSoup

# URL to be analyzed
url = "https://sslSite2Visit.de/index.php/kfz-service"

# Sending a request to the URL
response = requests.get(url)

# Checking if the request was successful
if response.status_code == 200:
    # Parsing the content of the webpage
    soup = BeautifulSoup(response.content, 'html.parser')
    # Output the parsed content for analysis
    soup.prettify()
else:
    soup = None
    "Failed to retrieve the webpage. Status code: " + str(response.status_code)

ending up with the error

File ~/.local/lib/python3.8/site-packages/requests/sessions.py:703, in Session.send(self, request, **kwargs)
    700 start = preferred_clock()
    702 # Send the request
--> 703 r = adapter.send(request, **kwargs)
    705 # Total elapsed time of the request (approximately)
    706 elapsed = preferred_clock() - start

File ~/.local/lib/python3.8/site-packages/requests/adapters.py:519, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
    515     if isinstance(e.reason, _SSLError):
    516         # This branch is for urllib3 v1.22 and later.
    517         raise SSLError(e, request=request)
--> 519     raise ConnectionError(e, request=request)
    521 except ClosedPoolError as e:
    522     raise ConnectionError(e, request=request)

ConnectionError: HTTPConnectionPool(host='auto-babiel.de', port=80): Max retries exceeded with url: /index.php/kfz-service (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ea78a312ee0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')

I created a setup file with description, how exactly should GPT access websites and uploaded it into the configuration/training area, but it ssems to have no expected stable effect.

My question: how can I configure custom GPT to achieve stable reliable access and parsing of external websites with and without SSL security?

matcha72 · January 23, 2024, 10:59am

You might also want to check if the website can provide content with a simple http request. Many a times you would need to use a browser based api such as with playwright as many websites generate dynamic content

orlov · January 23, 2024, 2:22pm

I test this workflow with some urls with static content, they definitively don’t have content rendered by Javascript and provenly deliver a 200 status code. Beside of this, to begin with the workflow, it would be fully enough to access existing source code error free.
As you can see from my citation of the error, the error is not a 404 or any other, related to the general url unavailability. How should I interpret your answer and how should it help me with my issue? What should be the sense of simple http request?

Topic		Replies	Views
Unstable output from GPT: Refuses to regenerate previous success API	3	745	December 14, 2023
Custom GPT can't access the Internet GPT builders custom-gpts	7	320	August 14, 2024
Difficulty Accessing JSON Data from Endpoint in Custom GPT GPT builders	2	568	May 23, 2024
Issues with MyGPT and Web Search Functionality GPT builders mygpts	1	113	February 21, 2025
Struggling with Custom GPT GPT builders gpt-4 , custom-gpt	2	659	June 28, 2024

Configure custom GPT to access and parse website source code

Related topics