Embedding Memories / Context Sizes

mark.storer · June 25, 2024, 9:21pm

Embedding Memories and Context Sizes

I currently have a web-based application that uses embeddings to record memories of the user’s web inquiries, essentially a chat.

Each time the user makes an inquiry, states a fact, or preference, I send that to the Embeddings API to encode the interaction.

My question is this:

How much context should be included in the Embedding Encoding, and how much context should be included in the Embedding Query.

Should I make more Embeddings with smaller context chunks, and conversely more queries with smaller context? Or, should I encode larger context chunks?

My current flow is this:

User states a preference or inquiry
I encode that statement into an Embedding from the Embeddings API
I store that statement as metadata and vector into a Pinecone database
I use that vector to query previously stored preferences or inquiries from Pinecone
I then use the metadata results of that query, prior dialogue for context, and the current inquiry to retrieve a GPT Chat Completion

This work pretty good. However, it still begs questions.

Should I include dialogue context when I encode the user’s initial inquiry or preference? If so, how much?

Should I maybe encode multiple vectors representing varying context with the inquiry or preference?

Should I use multiple queries with varying context to query previously stored preferences?

Are there other “things” I should be passing to GPT to get a well-rounded chat completion.

Any perspective will be greatly appreciated.

wclayf · June 26, 2024, 12:00am

It depends on what you’re trying to achieve. You just want the AI to remember relevant “facts” about the user? Or do you want it to know all about all prior conversations (which is much more difficult).

An idea I’ve had, but not yet tried, is to use LangChain’s “tool use” to setup a function like “Save user Fact”, so that the LLM will automatically call this potentially with a key/value pair or a single sentence/statement. Then when any new conversation is initialized those “facts” can be embedded into the System Prompt for that new conversation.

Maybe you can do that research for me, and report back.

mark.storer · June 26, 2024, 11:21am

I will have to look into it. I have not yet used Langchain. While I have looked at LangChain, I get tangled up with how to integrate it to existing platforms. Much of what I do is web-based or web-managed. I integrate the web with SMS and phone to provide the AI-based intelligence to these user interfaces. When connecting to ticketing systems, such as ServiceNow, I generally create supporting APIs hosted elsewhere because the internal tool scripting has limitations.

My purpose is primarily to give a familiarity to the automated agent. So if someone calls back in the Agent has context. If the User says “I prefer a MacBook.”, that is great. But if the users says “Four is more than enough.”, there is no context to what he meant if it was retrieved in a future conversation.

Still tinkering. Thank you for your insight.

Topic		Replies	Views
Best method of injecting relatively large amount of context to be leveraged in a response API	10	10160	December 17, 2023
How can I send vectors as a chat context? Prompting embeddings	8	7872	May 15, 2023
Fine tuning vs. Embedding API	21	44422	December 12, 2023
Vector embedding notes and chat history API embeddings , chat-completion , vector-db	4	2144	June 6, 2024
Embedding Longer Texts API	8	14080	December 25, 2023

Embedding Memories / Context Sizes

Related topics