I have uploaded a 10K filings pdf document but it is unable to extract any text at all…Is this a bug or am I Missing something?
Your PDF is not a valid link.
It appears you might be using ChatGPT’s advanced data analysis rather than a specific PDF-reading plugin by a third-party developer.
Advanced data analysis is the AI writing python code that it can run in a sandbox. It is limited to the libraries that are provided within that environment.
There are many PDFs that are locked against text extraction, and others that have no text at all, they are just images. Code cannot interact with these unless you first upload to a password-breaking site to unlock the PDF, or first use OCR on the document, such as “enhance scans” in full Adobe Acrobat.
Here, for example, there is no text to select or programmatically extract, all that I can copy is part of an image:
It did work when I asked it to go page by page, it sometimes acts strange, I understand where you are coming from though…