Web crawler example does not work

I am trying out the Website Q&A with Embeddings tutorial in python and running into problems. I set up python per instructions, downloaded the code and tried to run it. Each time, the crawler processes a few pages, then fails with an error like the one below:

OSError: [Errno 22] Invalid argument: ‘text/openai.com/openai.com_blog?authors=ashley-pilipiszyn.txt’

The strange thing is that if I delete everything and start over, it will fail again with a similar error, but with a totally different text string.

What could I be doing wrong? Has anyone tried this successfully? Help

I am getting same error while running the crawl program. any solution?

Files with a ‘?’ in their name cannot be created, so just add another replace to url in line 142:

        with open('text/'+local_domain+'/'+url[8:].replace("/", "_").replace("?", ".") + ".txt", "w", encoding="UTF-8") as f:

was thinking this, thankful someone else has the syntax, this fixes it.

if you run it long enough, other characters will pop up that break the flow and will need replace as well.

1 Like