Issue with Comment Extraction, Page Number and Article References from Document in Custom GPT

Created a custom “My Chat GPT” to assist the contracts team by automatically compiling comments into a table during the contract review process. However, some comments are missing, the page number and article references from the document are wrong and need to be able to download the compiled table in Excel or Word format.

Steps Taken:

  • Used the GPT to extract and compile comments along with their associated article references.
  • Attempted to troubleshoot by adjusting the prompt and settings, but the issues with missing comments and incorrect article references persist.

Expected Outcome: The GPT should correctly extract all comments, associate them with the correct page number and article references in the table, and allow the table to be downloaded in Excel or Word format.

I’m seeking advice on how to resolve the issues with missing comments, incorrect article references, and downloading the table in Excel or Word format, or how to better configure the GPT to handle these tasks accurately.

2 Likes

Welcome to the community!

A Custom GPT is easier to set-up, but you do lose control over the process. You might search around for other solutions, but I’m not sure if it’s able to accurately grab the page number and document name with 100% accuracy.

Hopefully others chime in.

2 Likes

Hi @helpdeskBMTC and welcome to the forums!

Regarding extracting/referencing page numbers: I actually created this thread a while ago. After performing numerous experiments, the closest I’ve gotten is by giving the instruction to treat each page as an image and use OCR. If the page number is visible, it will extract it, but it may not align with your actual document - for example, your document may have table of contents, a cover page, or other pages that are not numbered, so there may be an offset that is difficult to control.

Regarding extracting article references, footnotes, etc - this should in principle be ok if you provide it some examples of how they may look like.

But as @PaulBellow stated, it is difficult to guarantee high accuracy due to how text is parsed and chunked “under the hood”.

2 Likes

Hi,

Welcome.

Using a CustomGPT via the ChatGPT UI for this task is going to be fraught with the difficulties you’ve mentioned…always. @PaulBellow is absolutely correct. Not only the multi-step nature of the process you described, but the opportunity for hallucinations at each step.

The model they use as the base for the cGPTs is wayyy too creative for any project that has the word “exactly” in it’s list of needs. It’s not appropriate for this type of task.

Given the exacting, though basic, nature of this process—i.e. “extracting comments exactly,” “annotated/referenced properly,” “don’t make stuff up,” “export to a table,” “convert table to spreadsheet or rich text based on user preference”—you have a flow that could use multiple GPT 4o-mini powered Assistants and Structured Output to reduce hallucinations throughout the process and accurately summarize the comments with footnotes.

Along the way, @platypus has the right idea with identifying objects on the page by their visual characteristics, and “where you can usually find them.”

1 Like