I have a custom GPT where I want a pdf to be the knowledge base. It is a 728 page PDF. Size is only 17MB so size should not be an issue either. However, GPT is not able to read this doc (it says formatting issue). Even if I ask it to look at pages 80-90 of this doc specifically and then answer, it still cannot get to it saying that the processing time is too much even getting to page 80.

I am not clear what is going on on 2 fronts, so will appreciate any help. (1) how do we make sure that a pdf or any other file is actually going to be useable by GPT in terms of format and (2) Why is it not able to even read it when I specify the page numbers, because in the past it was always able to answer from docs when I specified the page numbers.

PDFs can be tricky to read. You should try opening and asking Code Interpreter to read a information-rich page and see how it manages.

If it can read the text, then I’d recommend using CI to convert the PDF into something more reliable like markdown.

If it can’t read the text (sometimes PDFs are baked-in images, including the text, sometimes the text has some funky stuff going on) then you may want to use something like an OCR to extract the information and again, convert it to markdown.

Tks for the reply Ronald. Do you mean that in the GPT Builder I just ask it to use Code interpreter, or is there a separate tool that I should use to do the experiment that you are suggesting?

You can ask ChatGPT 4 to parse a certain page using Python. It should know to use Code Interpreter