Understanding ChatGPT web browsing - methodologies for accessing and interpreting web pages

Hello OpenAI community,

Really excited by the web browsing plug and have a few inquiries regarding how it accesses and interprets web pages (early days I know). I’ve searched through the forum and didn’t find the information I was looking for, so I’m hoping someone can shed some light on my questions:

1 - What is the rendering method that the web browsing bot uses to read the content of a webpage? Does it read the page based purely on the HTML (akin to a cURL request), or does it also consider associated CSS and JavaScript, rendering the assembled DOM to extract content from the page that usually takes a few seconds longer per page (like the method Googlebot employs for indexing)?

2 - Does the Chat GPT web browser employ any specific markup to discern which elements of the page are usable content (such as title, h1 and p tags) and which are unnecessary (navigation and menus in the markup which may be part of a sitewide boilerplate theme irrelevant to the contents of the page)?

3 - Presumably, there would be a limit to the size of the content the bot can process from a page. Could you specify this limit, perhaps in kilobytes or words? How does this align with the token limits of the underlying model?

4 - Does the bot use a specific user agent when it browses the web?

5 - Lastly, considering the impressive speed of the browsing plugin, I’m assuming it’s reading HTML content directly, as opposed to downloading and rendering the entire page. Is it possible that the bot leverages web SERP results from, say, Bing’s Search API on their Azure platform (Bing Search API documentation - Azure Cognitive Services | Microsoft Learn) to return the first X results based on a search query, which also returns Bing’s cached version of the fully rendered page? I’m guessing that would be a plausible , scalable and cost-effective option given the partnership with Microsoft. In other words – when it searches the web for a query is going to Google, scraping the top 10 results (against their TOS), visiting 10 pages one by one (potentially very slow as it’s reliant on 10 different servers) – or is it making one single Bing Search API request and getting all 10 results from Bings cache to process at once?

Gaining clarity on these points would significantly aid in optimizing web content to be ‘Chat GPT web browser friendly’. I’m envisioning that if webpages are built using a specific structured data and markup, avoid reliance on client-side rendering, and are not overly bulky, it would expedite the bot’s browsing and yield better results. Ultimately, I’m keen on implementing best practices on how developers approach data structuring and web design to get content over to ChatGPT - looking forward to any insights available. R.


I’ve noticed sometimes it may end up in a loop :rofl:


Thanks for asking these questions… I am interested in these too! I would also like to add:

  1. Will/Are you be able to use the web browsing with the the chat API calls?

Yes, it makes HTTP GET Requests and parses the HTML