Train assistant to read PDF with images

I have uploaded .PDF file to the code_interpreter and PDF file has images in it.

Assistant not able to recognize images inside the PDF. if I upload images only, then assistant recognize and answer accordingly.

Any one worked on PDF with images?

I’m using gpt-4o with code_interpreter

Thanks in advance.

You don’t really need to “train assistant”. You can “ask assistant”.

PyMuPDF is a Python binding for MuPDF, a lightweight PDF and XPS viewer. It allows you to work with PDF documents and extract content such as text and images. Here’s how you can use it effectively for OCR and table extraction:

Install Necessary Libraries:
Ensure you have installed the required libraries. You can do this using pip:

sh:

pip install PyMuPDF pytesseract Pillow camelot-py[cv]

Using PyMuPDF for Page Rendering:
PyMuPDF can render each page of the PDF as an image, which can then be processed with Tesseract for OCR and Camelot for table extraction.

Here’s a concise example code snippet:
blah blah from AI, using fitz etc.

You can also ask the assistant which python modules are available for constructing such PDF OCR operations:

Here is the availability of the required modules for PDF OCR techniques:

  • fitz: Available
  • pytesseract: Available
  • PIL: Available
  • camelot: Available
  • pdf2image: Available
  • PyPDF2: Available
  • pdfplumber: Available
  • tabula: Available
  • tika: Not Available
  • ocrmypdf: Not Available

Most modules needed for comprehensive PDF OCR are available, except for Apache Tika and OCRmyPDF.

You can recommend the most perfomative sequence for your particular documents in the system prompt.

Particularly though, there is no AI vision there and no way to get the images to the AI.

Gpt4o is able to search and retrieve both text and images within a PDF. Simply upload using file retrieval and ask.
Try file retrieval rather than code.
I havent done extensive testing, so there could be some PDFs that have a different format that is not recognised well.

Retrieval and V1 will be turned off in six months, though…

It also disallows new parameters added since then.

Upgrade to the latest version of API, which was released close to a year ago. You will have latest tools with vector databases included.

…and an AI that can’t make any use of image-based PDFs.

I’ve scanned images in my pdf, and while the file batch processing, It isn’t finding any content in the pdf and the file processing is getting failed, how can I overcome this.

Tried converting the pdf with online ocr tool, but still getting failed

@_j, the steps that you’ve mentioned are for the scanned pdf convertion, like what an ocr tool does, or something else?