Train assistant to read PDF with images

sindia2012 · July 2, 2024, 10:31am

I have uploaded .PDF file to the code_interpreter and PDF file has images in it.

Assistant not able to recognize images inside the PDF. if I upload images only, then assistant recognize and answer accordingly.

Any one worked on PDF with images?

I’m using gpt-4o with code_interpreter

Thanks in advance.

_j · July 2, 2024, 1:11pm

You don’t really need to “train assistant”. You can “ask assistant”.

PyMuPDF is a Python binding for MuPDF, a lightweight PDF and XPS viewer. It allows you to work with PDF documents and extract content such as text and images. Here’s how you can use it effectively for OCR and table extraction:

Install Necessary Libraries:
Ensure you have installed the required libraries. You can do this using pip:

sh:

pip install PyMuPDF pytesseract Pillow camelot-py[cv]

Using PyMuPDF for Page Rendering:
PyMuPDF can render each page of the PDF as an image, which can then be processed with Tesseract for OCR and Camelot for table extraction.

Here’s a concise example code snippet:
blah blah from AI, using fitz etc.

You can also ask the assistant which python modules are available for constructing such PDF OCR operations:

Here is the availability of the required modules for PDF OCR techniques:

fitz: Available

pytesseract: Available

PIL: Available

camelot: Available

pdf2image: Available

PyPDF2: Available

pdfplumber: Available

tabula: Available

tika: Not Available

ocrmypdf: Not Available

Most modules needed for comprehensive PDF OCR are available, except for Apache Tika and OCRmyPDF.

You can recommend the most perfomative sequence for your particular documents in the system prompt.

Particularly though, there is no AI vision there and no way to get the images to the AI.

che.kulhan · July 2, 2024, 1:22pm

Gpt4o is able to search and retrieve both text and images within a PDF. Simply upload using file retrieval and ask.
Try file retrieval rather than code.
I havent done extensive testing, so there could be some PDFs that have a different format that is not recognised well.

_j · July 2, 2024, 1:35pm

Retrieval and V1 will be turned off in six months, though…

It also disallows new parameters added since then.

che.kulhan · July 2, 2024, 1:56pm

Upgrade to the latest version of API, which was released close to a year ago. You will have latest tools with vector databases included.

_j · July 2, 2024, 2:01pm

…and an AI that can’t make any use of image-based PDFs.

ankit_naag · July 22, 2024, 9:07am

I’ve scanned images in my pdf, and while the file batch processing, It isn’t finding any content in the pdf and the file processing is getting failed, how can I overcome this.

ankit_naag · July 22, 2024, 9:34am

Tried converting the pdf with online ocr tool, but still getting failed

ankit_naag · July 22, 2024, 9:37am

@_j, the steps that you’ve mentioned are for the scanned pdf convertion, like what an ocr tool does, or something else?

Topic		Replies	Views
Retriever Assistant can't read scanned pdfs? API gpt-4 , api	7	2996	July 22, 2024
What is the best way to parse a PDF file with ChatGPT? API	9	49503	November 16, 2024
How can I retrieve data from a PDF that was created from an image captured by a camera? API assistants-api , assistants-files	3	1006	May 4, 2024
Assistant API cant read my PDF.. How come? API api	4	2483	July 20, 2024
How to Programmatically Extract Text from Images Using GPT-4 API gpt-4 , chatgpt , api , assistants-api	9	7966	October 14, 2024

Train assistant to read PDF with images

Related topics