Issues with Accessing PDF Documentation

Hi @uset82 !

Regarding referencing precise pages, I created a topic about this a while back, you can check it out here.

It’s not the issue with the model, it has to do with how the PDFs are ingested and represented behind the scenes.

The documents are loaded as text by default, and chunked up (normally a few sentences at a time), and indexed using text embeddings (vector representations). Page and section information is therefore lost.

In the topic I linked above I suggested a few potential ways of improving it, but it’s still not guaranteed to work 100% of the time.

2 Likes