I’ve been working on a RAG Model utilizing OpenAI API with GPT-4-Turbo. I’ve used Chroma to create a persistent directory of embeddings, which I’ve loaded into the model as a retriever. So far, this has worked well. However, I’m trying to add a feature where users can save their current conversation in order to access it later. The point of this is that, if a user has provided context in a current conversation that doesn’t exist in the persistent directory, they can save that conversation (with chat history as context) in order to access that conversation later without re-explaining the context.
Has anyone had success with a similar feature / does anyone have links to other topics or tutorials with a similar goal.
I’m not sure I understand the question. Can’t you just save the conversation using any database you want? There’s nothing special about saving the conversation, unless you’re running out of context window space and need to somehow compress/summarize the earliest parts of the conversation to get it to fit into context window.
You have a good idea, however the concept of “access a whole conversation later” doesn’t really mesh with the limited context length of AI models.
One would already limit the number of past turns that is sent even in a single conversation session that is continued upon later for budget reasons if not simply hitting the maximum of the AI model (which could cost you over $1 a question).
Vector database search is a good way of extending the illusion of memory of a single conversation more. You can archive past user/assistant turns of conversation history to a vector database, and leave some room for bringing back some of the oldest chat that is similar to current input, by the limited semantic abilities of embeddings to match “what’s your name” to some past message “your name is permanently Bob the AI”.
Everything the person has ever said would definitely be a stretch, with limited usefulness. Randomly throwning in exchanges from weeks before?
A function that lets the AI save (or user instruct) particularly useful facts or memories for constant re-injection might be better.
What I’m thinking of, more so than saving everything the person has ever said, is saving a specific useful conversation. So, a user wouldn’t save their entire history with the chatbot, but simply the contents of a current conversation. Let’s say I’m having a conversation with GPT-4-Turbo about my hypothetical business, where I’ve given it a lot of details on business specifics. Later on, I may wish to resume this conversation without re-explaining all the necessary context about my business. I’m looking at a feature where I could save ONLY the current conversation, let’s say as “business_convo.json” - This would allow me to save multiple distinct conversations in different files. If I wanted to resume the business conversation, I’d be able to do that, but I could also resume a separate conversation about something totally different (with an entirely different chat history). In these scenarios, I’m assuming the conversation would be within the token limit.
I’ve tried using vector search and embedding the contents of the current conversation. This works reasonably well and is the way I’ve originally constructed my persistent directory, which supplies the knowledge-base for the RAG model. However, it’s less adept at precisely referencing specific elements of previous conversations than a simple history save feature.
In a conversation each message from the user and from the assistant are temporarily stored until the user abandons the conversation and you discard the stored messages or the user wants to explicitly store the complete conversation.
Then you take all these messages and put them into a persistent storage with user ID and additional metadata to later find this conversation on request.
When the user returns and requests the previous conversation you retrieve it and pass it to the assistant as additional context, just like you are doing with RAG.
This is pretty straightforward and you could most likely use your existing vector store’s metadata search to find the relevant information and build right on top your existing solution.
Mind the context length though. At some point summarization or pruning will be necessary.