How to implement the similar feature like chat with uploaded file feature in ChatGPT

In ChatGPT, we can uploade pdf/images to the chat window, and then ask following questions. I’m just wondering how to implemnet a similar feature in my own chat bot.
I know with the assistant + thread api, we could achieve the feature, but the cost might be high.
So not sure how ChatGPT implement the feautre? It’s also using assistant api?

1 Like

Does ChatGPT also convert the pdf to text in the background? And when sends the message history to api, it’s actually converted into text already.

You’re on the right direction, but it works a bit differently, let me explain:

  1. The pdf gets split into small text chunks (like a line from a paragraph)
  2. Each chunk is converted to an embedding and saved in a vector database along with the text chunk

To better understand the purpose of embeddings and how to use them, the documentation explains it very well:
https://platform.openai.com/docs/guides/embeddings

  1. When a user sends a message, this message is also converted to an embedding
  2. The vector database is queried to find embeddings which are the closest in distance to the user message embedding (meaning the vector is semantically similar to the user message)
  3. The text chunks from the most similar embeddings are passed as context to the GPT as a system prompt

TLDR: No, the whole pdf is not being sent as context on every message. Instead, by making use of embeddings, we identify the parts of the pdf that are related to the message and pass only those parts as context.

This technique is also known as RAG (Retrieval Augmented Generation)