Is it possible to analyze images contained in pdf files?

Hello,

I have pdf files that contain images as well as text. I would like to ask ChatGPT / my custom GPT / gpt4 via the assistants API questions about these documents, not only about the text but also about the images.

I suspect this is currently not possible, as GPT is saying it can analyze the image content in the uploaded pdf, but the answers (e.g. when asking what is shown on a particular image ) seem like it guessed what is in it from the surrounding text.

So I would like to confirm, can GPT “see” / have access to images in pdf files or is only an OCR performed on the files?

Thanks!

4 Likes

Would like to know the same.
Any update on this from the team?

It’s not currently possible, I did this, and it said it’s unable to see any images in a PDF file.

1 Like

Hi All. New user here and still learning the basics. I built a custom GPT and uploaded various PDFs with text, images, drawings, etc. But seems like the custom GPT 4o still cannot “see” the images in a PDF. Is there a simple way for it to ‘see’ images in pdf’s without having to extract the images and upload separately - kinda defeats the purpose. I would have thought this was possible with the ‘Vision’ capability showcased ? Thanks

I just convert pdfs into images using pdf2image, and use that with vision. It works fine for my use case. maybe not for everyone.