Vision within file_search: possible? good?

isidro · June 25, 2025, 11:13am

Hi. I’m trying to create an agent/assistant that recognizes paintings and, in general, images, when given access to them via RAG/file_search. Using the assistants API, I have created a vector store with 15 files, each of them contains an image of the painting and some information in text about it. When I upload the same image to the assistant and ask about it, I don’t get great results, even if I have instructed the assistant that it has access to a certain collection of paintings the user is going to ask about. Sometimes it is able to identify the painting correctly, sometimes it won’t, getting it confused with a different one. This is a painting that the model never gets right without context/uploaded files, which tells me it is partially working, although I wonder if the file_search tool is using vision to check the images and compare them or just reading the titles and artists in the text of each painting inspires them to get it right by just reducing the possible answers… This is consistent across different paintings and LLM models. Before going deeper experimenting with the responses API or with different formats for the files or organization of the vector store… It’d be helpful to know… does file_search / RAG work for images as well? does it perform vision? do you know of any interesting studies that may speak about the efficacy of it?

Google Lens seems to work better to identify full paintings but it fails at identifying chunks of them. Thanks

Topic		Replies	Views
Using vision in Assistants and vector databases API assistants-api	3	331	August 25, 2024
Assistants API with file search prompt input API assistants-api	2	147	January 29, 2025
Does Assistant API's file search tool use RAG by default? API assistants-api , file-search	2	660	November 20, 2024
RAG questions with assistants V2 API	1	1627	July 18, 2024
Does OpenAI's Vector Store Generate Embeddings for Both Text and Images in Uploaded Files? Community embeddings	4	1376	September 26, 2024

Vision within file_search: possible? good?

Related topics