Using the ChatGPT Plus plan with the GPT-4o model (32k token context window), I experimented with a 127-page PDF document to assess the model’s ability to extract information from images and tables. Out of 56 questions, 6 responses were inaccurate. However, when the same images or tables were uploaded directly into the chat, the responses were more precise and accurate. This raises the following questions:
- Why is the quality of responses when extracting information only from uploaded images more accurate than when extracting it from a PDF document?
- While uploading the PDF document into the conversation, I quickly reached the context window limit and was asked to start a new conversation. Does the PDF document and its content impact the token limit in the context window?
- Is the PDF document analyzed directly within the chat or in an external space, and does the context window limitation depend solely on my questions and GPT-4o’s responses?
I seek answers to these questions to better understand the limitations and potential of GPT-4o when working with large PDF documents and uploaded images.