You’re on the right direction, but it works a bit differently, let me explain:
- The pdf gets split into small text chunks (like a line from a paragraph)
- Each chunk is converted to an embedding and saved in a vector database along with the text chunk
To better understand the purpose of embeddings and how to use them, the documentation explains it very well:
https://platform.openai.com/docs/guides/embeddings
- When a user sends a message, this message is also converted to an embedding
- The vector database is queried to find embeddings which are the closest in distance to the user message embedding (meaning the vector is semantically similar to the user message)
- The text chunks from the most similar embeddings are passed as context to the GPT as a system prompt
TLDR: No, the whole pdf is not being sent as context on every message. Instead, by making use of embeddings, we identify the parts of the pdf that are related to the message and pass only those parts as context.
This technique is also known as RAG (Retrieval Augmented Generation)