Using images as context in prompt

Hi all,

I’m working on a classification task. I want the model to identify if the issue mentioned in a table is related to a specific topic. To do so I initially provided text only for context. I would now like to add images (.jpg files) containing all the knowledge base on the topic (images are pdf transformed into jpg that contain: images, tables and text).
Will the model be able to use this additional context ?
Considering it is a pretty large base ~600 pages of pdf, hence ~600 image files what is the optimal way of working?

  • Merge all the image files into one?
  • Input all the image individually?
  • Vectorize the images and follow a RAG architecture?

Thanks for taking the time

Welcome to the community!

You wanna upload images full of text?

Extracting the text first will probably yield better results :thinking:

1 Like

Thanks for the welcome!

Images contain: texts, tables and images. Hence why extracting the text appears limiting as the information from the tables and images would be lost.

I personally wouldn’t rely on vision to reliably extract table data :confused:

You can try, but I wouldn’t expect amazing results, unfortunately. Maybe start with a page or two and see how chatgpt responds.

How would you do the table extraction without vision?

Probably ABBYY or some other OCR tool

But even then, I’d probably normalize the table before (into rows) before feeding it to the model