I’m working on a classification task. I want the model to identify if the issue mentioned in a table is related to a specific topic. To do so I initially provided text only for context. I would now like to add images (.jpg files) containing all the knowledge base on the topic (images are pdf transformed into jpg that contain: images, tables and text).
Will the model be able to use this additional context ?
Considering it is a pretty large base ~600 pages of pdf, hence ~600 image files what is the optimal way of working?
Merge all the image files into one?
Input all the image individually?
Vectorize the images and follow a RAG architecture?