Issue with Comment Extraction, Page Number and Article References from Document in Custom GPT

Hi @helpdeskBMTC and welcome to the forums!

Regarding extracting/referencing page numbers: I actually created this thread a while ago. After performing numerous experiments, the closest I’ve gotten is by giving the instruction to treat each page as an image and use OCR. If the page number is visible, it will extract it, but it may not align with your actual document - for example, your document may have table of contents, a cover page, or other pages that are not numbered, so there may be an offset that is difficult to control.

Regarding extracting article references, footnotes, etc - this should in principle be ok if you provide it some examples of how they may look like.

But as @PaulBellow stated, it is difficult to guarantee high accuracy due to how text is parsed and chunked “under the hood”.

2 Likes