I’ve been using OpenAI’s Custom GPT feature to retrieve information from structured regulatory documents. Due to the 20-document slot limit, I previously merged multiple related PDFs into a single document, ensuring that each section remained clearly distinguishable with proper formatting:
Table of contents with links
Bookmarks for easy navigation
Tags, headers, and footers to enhance organization
Clear section breaks for each individual document
Previously, the retrieval worked well. For example, asking “What does Document X cover?” would return relevant excerpts. However, after OpenAI’s recent updates, the system now fails to retrieve even simple information from the merged document.
Observations:
The same questions work when documents are separate but fail when merged.
The issue persists despite improvements like adding clearer section headings and metadata.
It is possible that OpenAI’s document processing module now ignores headers or metadata, leading to sections being missed.
Request:
Could someone from OpenAI clarify if there were any changes to how PDFs are processed?
Are merged PDFs handled differently than before?
Are headers, footers, or metadata interfering with document parsing?
Is there a recommended structure to ensure Custom GPT properly retrieves information from a merged document?
This change has significantly impacted our ability to retrieve structured information efficiently. Any insights or workarounds would be appreciated.
I have EXACTLY the same issue here. Tried to add tags, bookmarks, table of content, front pages… nothing help. It used to work fine in the past, so I am afraid the retrieval alogorithm was modified.
I have noticed that it retrives information from the start and the end of the document and struggles with information in the middle if the text is too long. Not sure if you have noticed anything to this effect.
I am also having the same issue, the issue has started from 11-Mar-2025. In my GPT I have a large JSON file (3MB) and it struggles to get the data if the data is after certain number of lines (e.g. after 10k lines). It was working perfectly well prior to the 11-Mar-2025. I have raised a ticket with help desk. They have forwarded my issue to the technical team. I have even tried to add instructions saying to load the full file in memory and do a full file search but it’s all hit and miss at the moment.
The upgrades/downgrades at the backend remain a mystery, making custom GPTs potentially inaccurate over time.
It would be useful if OpenAI informed users of backend changes so they could understand the potential impact on their custom GPT and make necessary tweaks.