RAG for visual content in GPT4 vision

Hi, my first post here. I wanted to ask if it is possible to add additional visual data to use in GPT4 vision? kinda like RAG but for visual content. A sample for this would be a machine I have and I want to add data to GPT4 vision on how it can be assembled or fixed.
Thanks in advance.

You could build a vector DB either by generating image embeddings using an open-source model or by generating image summaries using GPT-V and embedding those using OpenAI’s embedding endpoint. However, speaking as a layman here, happy to learn about other strategies!