I have written a script that can give chat gpt 3.5 api memory. Where it saves the conversation and sends that in the call to chat gpt api. This works super well to improve the intelligence. However, the chat gpt api only allows a 4097 token call. It needs to be infinite.
Can anyone help me with ideas to navigate around this? Or is there some way to increase the tokens to send to chat gpt api?
A chatbot with memory is not novel, it is required to “chat” and not simply ask independent questions.
You need to truncate the conversation history at a reasonable point. There is no “infinite”.
Increased “memory” being drug along also degrades the quality and attention of answering the question at hand. Plus you pay for it.
If gpt-3.5-turbo is too small with its 4k context length, the answer is gpt-3.5-turbo-16k. You can upgrade to that model in code only when the token count requires it.
gpt-3.5-turbo-1106 also is 16k context at 4k price or less, but still “preview” quality.
Conversation history is somewhat overrated. ChatGPT uses far less than the maximum, and it mostly remembers what you were just talking about. People still think it learns from their mental health journals.
If you use a token-counting library such as tiktoken, you can always be pushing the model to the maximum input context length, or can set a more reasonable token budget.
There are other techniques to get the most at minimal cost, such as having a separate AI call occasionally make a summary of the oldest part of conversation.
On the more advanced end is using a vector database with semantic search and retrieval to recall things that have expired as part of the chat history you provide, so that a user can ask about older topics still and give the illusion of memory. For one question, the AI will remember that you named it Bob when you ask.
A user interface that lets you see your old conversation and what has expired automatically due to size, and lets you clear or disable past a point, or even re-enable certain responses, can be useful if you are the expert paying the bills.
Here’s a chatbot script, below the reply you see.
While the chat data itself is lossless as long as you are running the program, the chat[-10:] only passes five previous user questions and five AI responses back to the model each new input. Talk to that AI and see how much you really need to extend beyond the basics of a fixed history to maintain a fluent conversation.
It maybe is obvious but note that you will need to pay for all tokens for every message, because we need to resubmit chat history again for every message. For example, if there is a long chat history about meaning of life, then at the end user says goodbye and gets response “goodbye” back. The last goodbye may cost $1 dollar because we sent unnecessarily all of the long conversation.
In case of Assistant API Beta OpenAI will take care of chat history but as a I understand we still need to pay for all chat history for every message.
Using context and chat history is also not foolproof. There seems to be some experiements that with very long context that some data is often missed or hallucinated even if it exists in the prompt. As I remember someone experimented that gpt-4-1106-preview is good until about 70k tokens, but after that it starts to not observing alll the things in the middle of context adequately.
So I think you may want to consider ways to redesign your workflow to work at least within the 70 000 tokens. Antrophic has 200 000 tokens model, but it has even worse problem with missing things in the middle at long context text lengths per some small experiments.
Depending on what you do with the tokens, if they are exampels of style, then you may try finetuning. OpenAI finetuning is for style and structure, not knowledge because there is no options to adjust rank.