The cookbook gives you the basic structure on how to browse the web, but I found in practice I got pretty bad results from this. Web scraping the page content is difficult as most sites don’t like to be scraped. Using the requests library will result in many 403 forbidden responses, because servers can easily detect that this is not a real browser making the request. Using Selenium or similar is better, but also is quite time-consuming, and you will still get many sites detecting an automated browser.
I am really curious of how OpenAI gets around this problem with their own web browsing, and how they can be so quick. Does anyone know what they are doing differently?