Embedding Memories and Context Sizes
I currently have a web-based application that uses embeddings to record memories of the user’s web inquiries, essentially a chat.
Each time the user makes an inquiry, states a fact, or preference, I send that to the Embeddings API to encode the interaction.
My question is this:
How much context should be included in the Embedding Encoding, and how much context should be included in the Embedding Query.
Should I make more Embeddings with smaller context chunks, and conversely more queries with smaller context? Or, should I encode larger context chunks?
My current flow is this:
- User states a preference or inquiry
- I encode that statement into an Embedding from the Embeddings API
- I store that statement as metadata and vector into a Pinecone database
- I use that vector to query previously stored preferences or inquiries from Pinecone
- I then use the metadata results of that query, prior dialogue for context, and the current inquiry to retrieve a GPT Chat Completion
This work pretty good. However, it still begs questions.
Should I include dialogue context when I encode the user’s initial inquiry or preference? If so, how much?
Should I maybe encode multiple vectors representing varying context with the inquiry or preference?
Should I use multiple queries with varying context to query previously stored preferences?
Are there other “things” I should be passing to GPT to get a well-rounded chat completion.
Any perspective will be greatly appreciated.