I recently tried making a webpage to chat with GPT 3 API and it worked fine. In this chat, Human writes a question and then API responds. Now, I wanted to include the context of previous chats for further conversation and wanted to know suitable method for making API calls to achieve it. One way I found was to keep adding the newer chats to the prompt, as the new responses come, but prompt becomes lengthy as the number the number of chats increases.
The token limitation is a technical limitation. We only support 2049 tokens in the prompt and completion combined; if your prompt is 2000 tokens, your completion can be 49 tokens at most.
One creative solution, besides prompt chaining, may be to use the Answers endpoint , which allows you to upload a JSONL File with documents that you want to query. You could upload chats as documents to a file to be queried, and then query those documents.
Edit: There’s another thread on this that you may be interested in. Another option is to create summaries of past conversations, which are naturally shorter and can let you fit more context into the prompt.
As you have more and more history to go through, the amount of data you have will continue to grow. I have achieved some success here by having a separate “synopsis” function that takes, as input, the synopsis of the current chat logs (to know what the current conversation is about) and from that it generates keywords to search for old chat logs that are relevant. It then uses the same “synopsis” method to summarize those old chat logs, effectively compressing an arbitrary amount of past data to a few lines of text.
Put another way:
Current chat log input and summarized
Keywords extracted from current chat summary
Old chat logs pulled from DB/SOLR via keywords
Old chat logs also summarized, based on relevance to current chat log
You do lose some data of course, but we have relatively narrow constraints.
@daveshapautomator - I think that is a powerful approach with great potential.
However, how are you measuring and ensuring high quality of the summary and search terms; and is there any reason to think that this abstraction is better than just indexing everything?
To elaborate though, I think you would probably rather use the Search endpoint to retrieve the most relevant previous exchanges and include these as additional context in the prompt. E.g. think of it as a chat log with only some of the exchanges shown. You may want to order or ‘timestamps’ the shown exchanges.
For initial prototyping, you will probably want to try out that approach and using the arguments keyword, rather than summarizing or the jsonl. I think those should be explored after you see success with the simpler cases.
Note also that you do not need to fit everything into a single prompt - you can answer questions related to the prompt (e.g. the last message mentions Dave - who is Dave?) and then include those in the final generation; or generate with different contexts and picking the best response. Plenty of opportunities for creative solutions
Recall and precision are huge topics in information science so I have a long ways to go in order to make sure this is high performance. At this very moment I am implementing several improvements to my historical recall scheme! I am basing some of the improvements on what I know about human memory, for instance human memory favors recency. I could easily pull millions of memories from SOLR in an instant but that amount of text would be too much for GPT-3 so I am favoring matches based on the most recent.
As I learn more about SOLR I expect I’ll get better about pulling exact results - but this is essentially a search problem.
@daveshapautomator - Thanks, makes sense. In the context of generation where the generated document/chat etc becomes too long, there would not be a need to use SOLR to deal with that particular generation’s context though? SOLR might be more useful as a general fact database eg?
Yes, that is correct. SOLR is a private search engine meant for holding many documents. Consider the possibility of an AGI agent that will stay active for many years. It will end up with millions or billions of memories. That is my use case
How to learn, associate, forget, abstract - exciting but challenging stuff. I wonder how to keep all the ideas there pragmatic.
I also wonder if at that size, having ‘memories’ that are personal are actually beneficial? Admittedly, with a million exchanges, it should be financially viable even to use the OpenAI files. Like you say, you probably want to start abstracting the memories. As summaries or other ways to build out to knowledge over time. The result of that should hopefully not be a gigantic set? So maybe the billions-scale memory would only be relevant for the public information. So is there a benefit in storing the memories in that database?
I am already extracting insights from interactions such as emotional inference, identifying cases of lack of understanding, and so on. Part of the benefit of low-cost high-speed storage of SOLR is that you can store every insight rather than generate it again and again. This is the same for humans - you might not remember exactly what was said word-for-word, but you remember the summary of a conversation as well as the context (emotions, what else was going on, who participated, etc).
Hi Dave, are you loosely mention SOLR here as a placeholder for large scale open-source text optimized searchable datastore or you have also done research picking SOLR specifically as suppose to alternatives like elasticsearch in which both built on top of Lucene which provides the search capabilities?
I personally use SOLR as it was recommended to me by another researcher. However, plenty of people use other tools such as those that you mention. I’ve also done experiments using just SQLITE. Pretty much any indexed/searchable data source is a viable option!