Web-Crawler Tutorial (OpenAi) 403

gjnave · December 28, 2023, 2:09am

Lot of things I’m not understanding
First of all: the code runs well with the following: (I added spaces so i could post links)

Define root domain to crawl

domain = “openai . com”
full_url = “ht*ps: // openai. com”

I get a list of the files its crawling but many say “can’t parse”. Why is that?

secondly, if i change the code:

domain = “mysite. com”
full_url = “ht*ps: // mysite. com”

then I get a straight 403 error. Why?

Thank you

PaulBellow · December 28, 2023, 6:17am

Welcome to the forum.

Do you have a link to the tutorial? Some code?

Not all sites will allow you to scrape them…

gjnave · December 28, 2023, 3:30pm

Thanks Paul!

polepole · December 28, 2023, 5:28pm

I hope this can help you:

Topic		Replies	Views
OpenAI Tutorial Web Crawler - Please turn Javascript on Community tutorial , openai-documentation , documentation	1	313	June 16, 2024
Web Crawling official documentation Community gpt-4 , api	0	782	May 31, 2023
Web crawler example does not work Documentation	3	1351	August 8, 2023
Crawling Website Tutorial Rate Limit Error API	8	730	February 4, 2024
Create an IA which will crawl the pages and talk about it Community chatgpt	9	2167	January 29, 2024