How to Populate a Vector Store with PDFs and Images for Searchability Using OpenAI's GPT-5 Model

Hi everyone,

I am working with OpenAI’s model based on Response, and I would like to use GPT-5. My goal is to populate a vector store with both PDF files and image files so that they can be searchable.

Could anyone guide me on how to achieve this? Specifically, I’m looking for a way to make both types of files (PDFs and images) searchable within a vector store using the GPT-5 model.

Thanks in advance for your help!

1 Like

You can absolutely do this! the key is to use multimodal embeddings so that both PDFs (text) and images become searchable in the same vector space. GPT-5 / Response models don’t generate embeddings directly, so the recommended approach is to use the text-embedding-3-large (or small) model, which works well for both extracted text and generated image descriptions.

You cannot populate a vector store with images, nor images from PDFs; images are not extracted.

You cannot send images to any OpenAI embeddings model to make your own image similarity search or image query.

An OpenAI vector store cannot return token chunks of anything but text, and thus cannot “RAG” about an image that it then has vision of. It is a completely internal textual document retrieval tool, and thus you cannot substitute a product for its text embeddings or metadata.

A “description of images” like the above response (likely output of AI) might work if you develop your own original semantic search database of images, from your experience in writing normal vector database products. However, it cannot work to make a coherent rich PDF document to be returned with images.

What you will need to do for PDF specifically, from your own ground-up vector store and code, where a chunk for placement contains:

  • Per-page text extraction
  • Per-page full-page render of PDF to image (or multiple slices of a page for better vision)
  • Optional: metadata from expensive AI vision on a page to also place in embeddings

Then you can run embeddings semantic search on the text by embeddings text query, and return both page products as RAG. This is also limited, because only Responses allows a developer function to return images, and any context-based automatic RAG would have to use a user message to place images due to OpenAI’s gatekeeping.

You will have to make a product that the AI can understand: why it is getting fed random pages, or use document reconstruction to have a run of overlapped pages and even more as context.

So it can be done. And is a million-dollar vertical.