How I would do it is to create a dataset and add each prompt and output on top of each other:
Session1:
Prompt1: text
GPT generates output
Session2:
Prompt1: text
Output1: text
Prompt2: text
GPT generates output
and so on…However you will need to keep the amount of tokens in consideration. You can’t keep adding unlimited prompts and outputs. So then you could create some code to forget the oldest lines, once you’re almost out of tokens.
That’s how I would do it, but maybe there are better methods.
This is a little bit constrained though as we are limited on number of tokens for the model, so increasing the prompt length across responses quickly would hit this limit?
Would love to know if we can replicate the experience of the chat.openai.com interface via the API?
If a dev passes by here id suggest adding a setting to the api call that lets us set a number of max tokens to store on the backend for historical context. This can be associated with a sessionId, and a max-age variable, after which it is automatically deleted.
Would also need a list/delete sessions api call, if it’s possibile to use a max-age of 0
That’s weird because I asked ChatGPT about how to make a request with context of the previous response since I’m making a chatbot front end for text completions, and it mentioned that you can add a context value to the request. I haven’t gotten to try it out yet, but I wouldn’t be surprised if it would get something wrong, considering it only has info about 2021 and prior.
I had the same issue and implemented a similar strategy. However, I’m only saving and including the most recent response from Chat GPT in my next session rather than the entire conversation. this isn’t perfect but it does a decent job of knowing what you are talking about with just this and it saves tokens. So for example if you say “Tell me the best time to visit Florida” and it answers. then you say “How about Maine?” it will know you are talking about the best time to visit because it is in the previous response.
I’m hoping for a better solution soon because the token cost will add up quickly.
In my use case, I have to provide quite long texts upfront to the bot so it can learn from context and reply the user properly. So, after the first interaction, I can’t resend all those tokens, because It has already exceeded the 4,000 tokens limit to the model. I’m stuck on this issue.
rn with ChatGPT APIs not supporting any form of “sessions”, I was forced to send some “context” on every query. However, LangChain has some nice support for summarizing prior prompts - ConversationSummaryBufferMemory
So, you don’t have to struggle to prune your prior Qs size to be less than 2k or 4k or … Your summary can be within this limit. And LangChain ends up summarizing conversations & sending these summaries as “context”.
Just another mechanism which may help you.
Thanks. It’s inevitable I guess. Many are brought to this service because of ChatGPT, only to realize that the API seems not to be as intelligent as the chat itself, the absence of a session being probably one of the reasons
It depends on how much historic “context” you require.
I asked Chat GPT and it responded with what we have suspected. We must concatenate previous responses in subsequent requests.
As you mentioned, OpenAI’s GPT-3 API does not currently support sessions, so it cannot maintain state or context between API calls. To maintain historical context in repeat API calls, you can include a summary of previous interactions as context in your subsequent API calls. This can be done by concatenating all the previous outputs and using them as the “prompt” in your next API call.
For example, in your code, you could create a variable to store the conversation history, and concatenate the output of each API call to that variable before making the next API call:
conversation_history = “”
response = openai.Completion.create(
engine=“text-davinci-003”,
prompt=“Tell me a joke?”,
temperature=0.7,
max_tokens=1000,
top_p=1.0,
frequency_penalty=0.0,
presence_penalty=0.0
)
conversation_history += response[‘choices’][0][‘text’]
response = openai.Completion.create(
engine=“text-davinci-003”,
prompt="What was the last question? " + conversation_history,
temperature=0.7,
max_tokens=4000,
top_p=1.0,
frequency_penalty=0.0,
presence_penalty=0.0
)
print(response[‘choices’][0][‘text’])
prediction_table.add_data(gpt_prompt,response[‘choices’][0][‘text’])
In this example, the variable “conversation_history” stores the previous output and is concatenated to the prompt in the next API call to maintain the historical context of the conversation."
Alternatively, embedding is an option but requires a backend server/services such as Redis or Pinecone.
To original question of maintaining historical context/session, I’m not sure if ChatGPT is using some form of server side (or api level) session management yet!
Looking at the network payload, ChatGPT client seems to be sending the previous interactions as part of new question/payload.
So the solution as previously pointed out (for now) would be to prepend the entire conversation set when making the new one. (at the cost of tokens).
Just to correct your statement so we are technically accurate, Redis or Pinecode are not required for the tasks described and can easily be accomplished using most any SQL database.
Redis is useful, and very helpful, but is not a absolute requirement as you mentioned.
It’s probably best to filter out words and chars which have little information content (value) to save a few tokens here and there.
LOL. You should be careful referencing an ChatGPT technical guidance. ChatGPT is a type of text prediction, auto-completion engine AI and not an expert system AI. ChatGPT more-often-than-not will “cobble up something” which is not fully accurate to generate a completion.
Oh, when will they ever learn ? Oh, when will they ever learn ? - Peter, Paul and Mary (1955)
Note:
OpenAI API is just an API which provides access to the OpenAI API endpoints. It’s not a “full blown chat bot application” so to maintain historical context you should use a database to store the prompt and completions. How you implement your database, filter and summarize prompts and completions, and feed historical information back into an API completion call will depend on your use case and requirements.
You can literally ask chatgpt to create a proxy API in php to write the json node to a file and append it within the prompt. It requires a proxy API page that takes the data and forwards it along. Took me 4 hours to make, and my not even thinks it has a brain. Did the same with the new model in 30 min. I’ll post my code to git
Feeding back an NLP analysis of the last Response into the follow up Prompt helps sustain conversation flow. GPT is asked to extract keywords, named entities, context and sentiment from the Response and add them at the head to the follow up interaction. In this way conversation flow appears to be sustained.
Topic, DoNLP, LastResponse and FollowUp are range names.
DoNLP
analyse the Prompt using NLP and return topic, context, named entities, keywords and sentiment and then respond to the Follow Up question :
FollowUp
Who were the main characters
In A10 is the formula ="On the topic of: “&Topic&” "&DoNLP&CHAR(10)&CHAR(10)&LastResponse&CHAR(10)&"Follow up: "&FollowUp
Where the last response was about Bulgakov’s novel the White Guard the next Prompt becomes:
analyse the Prompt using NLP and return topic, context, named entities, keywords and sentiment and then respond to the Follow Up question : The White Guard was written by the Ukrainian writer Mikhail Bulgakov. It is a novel that depicts the events of the Ukrainian Revolution of 1918 and the subsequent civil war in Ukraine. Bulgakov is also known for his famous novel, The Master and Margarita.
(Source gpt-3.5-turbo Temperature 0.7)
and respond to the follow up question : Who were the main characters