Embedding Memories / Context Sizes

Embedding Memories and Context Sizes

I currently have a web-based application that uses embeddings to record memories of the user’s web inquiries, essentially a chat.

Each time the user makes an inquiry, states a fact, or preference, I send that to the Embeddings API to encode the interaction.

My question is this:

How much context should be included in the Embedding Encoding, and how much context should be included in the Embedding Query.

Should I make more Embeddings with smaller context chunks, and conversely more queries with smaller context? Or, should I encode larger context chunks?

My current flow is this:

  • User states a preference or inquiry
  • I encode that statement into an Embedding from the Embeddings API
  • I store that statement as metadata and vector into a Pinecone database
  • I use that vector to query previously stored preferences or inquiries from Pinecone
  • I then use the metadata results of that query, prior dialogue for context, and the current inquiry to retrieve a GPT Chat Completion

This work pretty good. However, it still begs questions.

Should I include dialogue context when I encode the user’s initial inquiry or preference? If so, how much?

Should I maybe encode multiple vectors representing varying context with the inquiry or preference?

Should I use multiple queries with varying context to query previously stored preferences?

Are there other “things” I should be passing to GPT to get a well-rounded chat completion.

Any perspective will be greatly appreciated.

It depends on what you’re trying to achieve. You just want the AI to remember relevant “facts” about the user? Or do you want it to know all about all prior conversations (which is much more difficult).

An idea I’ve had, but not yet tried, is to use LangChain’s “tool use” to setup a function like “Save user Fact”, so that the LLM will automatically call this potentially with a key/value pair or a single sentence/statement. Then when any new conversation is initialized those “facts” can be embedded into the System Prompt for that new conversation.

Maybe you can do that research for me, and report back. :slight_smile:

I will have to look into it. I have not yet used Langchain. While I have looked at LangChain, I get tangled up with how to integrate it to existing platforms. Much of what I do is web-based or web-managed. I integrate the web with SMS and phone to provide the AI-based intelligence to these user interfaces. When connecting to ticketing systems, such as ServiceNow, I generally create supporting APIs hosted elsewhere because the internal tool scripting has limitations.

My purpose is primarily to give a familiarity to the automated agent. So if someone calls back in the Agent has context. If the User says “I prefer a MacBook.”, that is great. But if the users says “Four is more than enough.”, there is no context to what he meant if it was retrieved in a future conversation.

Still tinkering. Thank you for your insight.