I am trying to make my own gpt with 3.5-turbo model (obviously less capable) but what I was not able to achieve is how can I make it remember the chats we had, and not many just 20-30 interactions or 40 at max.
And I am using the API way to chat with it, one way I was thinking is to add in the prompts in API like user then assistant then user again but with each interaction, the API would be very lengthy and a lot of tedious work would pile up.
Is there any other way to achieve the same functionality?
For long term memory, embedding is what you need, it is possible to create unlimited long term memory. If you need more information, please leave response on this topic I may be able to create a tutorial. (It takes a few days but I will !) Just let me know which vector database you are going to use.
Hey @kevin6 , thanks for you such kindness. But I am just making a way to use gpt3.5 like gpt3 as I am on free trail right now.
As far as DB is concerned I haven’t thought of anything about that, right now I am using localstorage for summary we had throughout chat including previous prompts and response summary. So whenever I put new prompt I use the already stored summary using text-davinci model and attach my latest prompt with that summary and send it to turbo.
This is I am doing to emulate memory for my chatbot.
But if you can make a tutorial then that would be amazing to see
I heard your advice and now am making a summary by sending the prompts and replies’ summary to text-davinci (kinda like inception summary of summary) and then attaching my new prompt to that summary to pass it to 3.5.
But if you can suggest something new to my approach that would be very welcoming.
Basically you are trying to maintain state for your conversations. In general LLMs are stateless, ie they do not keep memory of your previous conversations.Behind the scenes the way a ChatGPTs maintain “memory or state” is they concatenate your current prompt along with your previous prompt+response.This provides “context” to the current LLM prompt request. Ofcourse this context window is limited and that’s why when a conversation goes too long, the concatenated prompts go outside the context window and its loses the plot.
In python I do the following:
prompt1 = " Explain memory"
response1 = “It is something to do with maintaining state of conversation”
prompt2 = concatenate (prompt1, response1) + " What is a conversation?)
to the LLM prompt2 will look like " Explain memory It is something to do with maintaining state of conversation What is a conversation?" thus it will do its sentence completion accordingly.
Anyway that was the explanation of what is happening , you can implement this statefulness in python by using Langchain libraries which makes it easier for creating these prompt templates and keeping chat memory.
hi @Vayu , I also did the same. I am taking taking prompt and its reply and then sending it to text-davinci to make a summary of it and store in localstorage or something.
And then concatenating the new prompt with summary and then passing it to 3.5 turbo repeating it with each new prompt. And what I have seen so far it is emulating like the memory for AI.
Those langchains you are talking about are they in JS too?
Thinking maybe you can copy paste the code and ask ChatGPT to convert it to JS? Might be worth a try. Also read something about MSFT research also come up with : https://github.com/microsoft/semantic-kernel
I have been copying my completed chats and saving them to a text or Libra Office Writer file, editing out the parts I don’t need to keep and then saving the file to my system. This keeps it simple, plus the best things in life are free. Plus I can if I wish to, store the chats in a DB of my own design and make them searchable. Life is too easy.
I am kind of stuck on the same issue with memory at the moment, so i created a chatbot with gpt 3.5 turbo, i also used langchain with open ai function call agent, created some structured tools for Amadeus api flight search, hotel search, etc. And then deployed everything on streamlit, so my problem is the api response is around 110k tokens which is far more than gpt can handle, so i was wondering if its possible to turn this response to embeddings then pass it to gpt as a summary or something to extract the user query from it?
You can include a token count in the metadata alongside text when doing initial embeddings, and then when you get back the top 10 similarities within your threshold of similarity, you can include just from those top augmentations only until the total token count exceeds the space you want to reserve for knowledge insertion.
Easier when you don’t have to sift through someone else’s code to do it.
Hmm, oki i did something similar, i hard coded the max offers returned by the tool to 5, and its working fine, but thats not practical, for example if someone asked the bot, what is the cheapest flight in this date, so the bot will return 5 offers and pick the cheapest in them, which is wrong, cause its not the cheapest anymore.
thats similiar to what you are suggesting i think.
Anyway, i was thinking more of returning the full response in a vector database with embedding, then use the backed by victor memory from langchain to return the top 5 offers based on the user query.
At the moment this is all theoretical in my head, So i was wondering if any1 done something similar to this before?
kindly note im a bit of newbie here, only started this 1 month ago.
If using such live data that can be retrieved online, and the service’s API has the search parameters to narrow the results, you can use the function-calling ability of models.
make some functions like get_cheapest_hotels, get_cheapest_flights, … that make the similar calls to the provider’s API. Then the AI can call those when it feels the information retrieval described would be useful.
That is quite a topic deviation from “remember our chats”.
I don’t think it works like that, run the tools manualy, without the agent to search for the cheapest, and used a predefined search parameters, the response comes back with 55 pages of flight offers, even though i used the cheapest.
the api end point, doesn’t really return the cheapest, but instead , it return all the available flights then search for the cheapest.
Thats why i was thinking of creating a function inside the tool to return the response and store it in a vector store backed retriever https://python.langchain.com/docs/modules/data_connection/retrievers/vectorstore , then use the VectorStoreRetrieverMemory from langchain . https://python.langchain.com/docs/modules/memory/types/vectorstore_retriever_memory
The idea is to make the openai agent function calling uses the tool >>> the tool will store the resonse in the vector database >>>> the memory for the agent will be the database that retrun the top k results >>>> then the llm will pass it to the user as natural language.
not sure if im missing something or the idea could be done.
You can see why online flight finder services, putting travel agents out of business for twenty years without AI, use complex web user interfaces to narrow the choices for results and selecting subsets of results.
You’d likely want to write your own database for servicing the AI requests.
Then the function for cheapest flights for the AI can have properties like [departing_city, arrival_city, search_start_date, search_end_date, …] that the AI must gather.