Unknown Error Occurred when uploading PDF

If you ever hit a situation with PDF’s, there’s always the brute force OCR route to extract data from them. But the zip file trick and the direct upload to ChatGPT just didn’t work for me either.

Here is brute force OCR code to get the contents:

import pytesseract
from PIL import Image
from pdf2image import convert_from_path
# brew install poppler (on Mac)
import os

# Function to extract text from PDFs using OCR
def extract_text_with_ocr(pdf_folder):
    ocr_texts = {}
    for file_name in os.listdir(pdf_folder):
        if file_name.endswith(".pdf"):
            file_path = os.path.join(pdf_folder, file_name)
            # Convert PDF to images
            images = convert_from_path(file_path)
            text = ""
            for image in images:
                # Perform OCR on each page
                text += pytesseract.image_to_string(image)
            ocr_texts[file_name] = text
    return ocr_texts

# Extract text from the PDFs using OCR

attachments_path = "/path/to/your/pdfs"

ocr_texts = extract_text_with_ocr(attachments_path)

# Display a summary of the extracted texts for review
ocr_texts_summary = {file_name: text[:500] for file_name, text in ocr_texts.items()}
print(ocr_texts_summary)
2 Likes

On a Mac, I managed to solve the issue by opening the PDF in Preview, and then print it as PDF. In my case a PDF file was generated that ChatGPT accepted without complaint.

Dont know why doing this actually fixes the issue, but this works!

1 Like

Same issue, plus PDFs that appear to be successfully uploaded to a Custom GPT are gone at the next session. All PDFs have been through OCR with Adobe before I tried to load them into GDP. I tried uploading them through all 3 windows, same effect. I told it to OCR each new PDF when uploading them. I can get the GDP to process all files correctly in the session that way, in the end I can see them all correctly. But once I shut down and come back in, only a random selection of them remains in my knowledge base. All are similar size, produced in the same way, no clear pattern as to which ones remain and which disappear.

Thank you, this worked on my Mac M1.

I’ve tried taking the content of the PDF, punching it in to Google Docs, saving as a .docx - that also fails.

I’m pretty dumbfounded tbh.
I’ve not been able to do some of the ideas suggested here for different reasons.

In case it helps anybody, I had the same issue from my laptop, couldn’t upload a 120kb pdf, about 70% of the file loaded it would fault on “unknown error occurred”.

I saved the pdf on my phone, opened the ChatGPT app and uploaded the file from my phone into a chat with my question. Then, I was able to follow up on that conversation from my laptop.

I have had the same issue in the last few days. Worked perfectly with Claude.