CustomGPT Struggles to Retrieve Information from Merged PDF Documents

Hello,

I’ve been using OpenAI’s Custom GPT feature to retrieve information from structured regulatory documents. Due to the 20-document slot limit, I previously merged multiple related PDFs into a single document, ensuring that each section remained clearly distinguishable with proper formatting:

  • Table of contents with links
  • Bookmarks for easy navigation
  • Tags, headers, and footers to enhance organization
  • Clear section breaks for each individual document

Previously, the retrieval worked well. For example, asking “What does Document X cover?” would return relevant excerpts. However, after OpenAI’s recent updates, the system now fails to retrieve even simple information from the merged document.

Observations:

  • The same questions work when documents are separate but fail when merged.
  • The issue persists despite improvements like adding clearer section headings and metadata.
  • It is possible that OpenAI’s document processing module now ignores headers or metadata, leading to sections being missed.

Request:

Could someone from OpenAI clarify if there were any changes to how PDFs are processed?

  • Are merged PDFs handled differently than before?
  • Are headers, footers, or metadata interfering with document parsing?
  • Is there a recommended structure to ensure Custom GPT properly retrieves information from a merged document?

This change has significantly impacted our ability to retrieve structured information efficiently. Any insights or workarounds would be appreciated.

Thanks!

3 Likes

I have EXACTLY the same issue here. Tried to add tags, bookmarks, table of content, front pages… nothing help. It used to work fine in the past, so I am afraid the retrieval alogorithm was modified.

2 Likes

I have noticed that it retrives information from the start and the end of the document and struggles with information in the middle if the text is too long. Not sure if you have noticed anything to this effect.

1 Like

I confirm your observation. Chunks in the middle of the documents are less likely to be retrieved, somehow.

2 Likes

I am also having the same issue, the issue has started from 11-Mar-2025. In my GPT I have a large JSON file (3MB) and it struggles to get the data if the data is after certain number of lines (e.g. after 10k lines). It was working perfectly well prior to the 11-Mar-2025. I have raised a ticket with help desk. They have forwarded my issue to the technical team. I have even tried to add instructions saying to load the full file in memory and do a full file search but it’s all hit and miss at the moment.

2 Likes

What type of account do you have? I’ve heard about this issue from a few Plus users but I’m not experiencing it on Teams currently.

2 Likes

Looks like the same thing has happened before:

t/gpt-suddenly-cant-read-files-in-its-knowledge-base/578576

(won’t allow me to include links, even to other threads on this domain?)
I’m having the same problem with my GPTs

2 Likes

I see. I have ChatGPT Plus not Teams.

1 Like

Yes, I confirm this too.

The upgrades/downgrades at the backend remain a mystery, making custom GPTs potentially inaccurate over time.

It would be useful if OpenAI informed users of backend changes so they could understand the potential impact on their custom GPT and make necessary tweaks.

3 Likes

I have a privat account and a team account. There is no difference is how the GPT acts.

The problem seems to be resolved. It is working again!

1 Like