Using vision in Assistants and vector databases

meddah.abdellah.spcx · August 19, 2024, 3:57pm

Hello, I am working with OpenAI assistants and they don’t seem to disclose how their RAG works. I was wondering if images are passed as context to the LLM or is it only text ? I am asking this question cuz I am doing Retrieval from a bank of PDFs that contain schemas, and passing these schemas as images to chatgpt-vision helps getting a better answer, so I thought to myself, would OpenAI assistants do that ?

Thanks in advance.

jr.2509 · August 19, 2024, 5:14pm

You can pass images via the Assistant API. The documentation describes this here: https://platform.openai.com/docs/assistants/deep-dive/creating-image-input-content

You should note though that you would have to pass the schemas as a separate image with the purpose vision. What is currently not possible is to upload a PDF file and then have the Assistant process both the text and the image at the same time.

meerichahn · August 19, 2024, 8:32pm

I think, I am not sure, but what it seems like is you are trying to input an image, then get the same image output through GPT 4o,. It will not do that now, because it renders every image. It never copies an image like a scan. You can scan an image in, but if you ask for that image back with different words or something, it does not use the same image. It had to render a new one, and it can’t seem to do the same exact thing twice. If I am way off, then just ignore me lol.

meddah.abdellah.spcx · August 25, 2024, 6:34pm

Not exactly what I am trying to do, I have a bank of images that I need to extract information from, when I try with chatGPT vision it works, but if I try by passing it through an OCR and then giving it to chatGPT as text, it doesn’t work. So I am wondering if Assistants, would retrieve images along with some text.

Topic		Replies	Views
Asisstant API for querying image API	5	442	June 25, 2024
Vision capabilities dont work well with assistants? API	0	42	December 27, 2024
Assistants API with file search prompt input API assistants-api	2	110	January 29, 2025
Retriever Assistant can't read scanned pdfs? API gpt-4 , api	7	2976	July 22, 2024
Can an assistant help me with OCR? API gpt-4	7	3550	June 6, 2024

Using vision in Assistants and vector databases

Related topics