How to maintain context with gpt-3.5-turbo API?

I thought the user parameter is doing this job. But it doesn’t work.


If you read the docs you just posted, it clearly stated the user param is only for abuse monitoring by OpenAI staff.

Also, f you want to maintain state, you need to write code to store messages and resend them (feed them back) to the API. You can search this site for how to do this as this topic has been discussed many times here.

OpenAI APIs currently does not manage user sessions. This code must be done by the application developer.




Also you can look at the documentation it has a little link that get you to a page where they explain how to do that. But what is not explained is how to manage the previous prompts to make sure you can minimize the amount of tokens you are using and how to optimize the length of the messages to achieve that

ChatGPT Completions

import openai

        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
        {"role": "user", "content": "Where was it played?"}

Note: you need to be using OpenAI Python v0.27.0 for the code above to work

The system message helps set the behavior of the assistant. In the example above, the assistant was instructed with “You are a helpful assistant.”

gpt-3.5-turbo-0301 does not always pay strong attention to system messages. Future models will be trained to pay stronger attention to system messages.


The best method I found to maintain a semblance of context for my GPT calls was to chain a few of them together.

def gpt_call_2():

{“role”: “system”, “content”: “You are a helpful assistant.”},
{“role”: “user”, “content”: “Who won the world series in 2020?”},
{“role”: “assistant”, “content”: f"{gpt_call_1()}"},
{“role”: “user”, “content”: “Where was it played?”}
By using instructions in prompting, you can tailor the output of the first call so that it is direct and concise and sort of chain them together.
This is by using python f-string btw.

1 Like


from my side I compress historical data in zlib to save tokens. It could be very efficient depending on kind of text.
At each new prompt, I send n last prompts/answers compressed in zlib into the assistant field which are interpreted as it would be if uncompressed and my bot reacts like chatgpt application. The history could be from 10 to 15 prompts depending on requests.

EDIT following discussion with @ruby_coder : I’d never been able to finish this process. Zlib compressed data contains unauthorized characters and the JSON request is malformed. My last idea is to UrlEncode() the compressed string but I am not sure it could work (and my capacity to develop code is very limited) and if it would be efficient to save tokens. If I find a new solution, I will add this to this topic. Sorry, my bad.


Interesting, but I have tried this before cannot get compression to work when I try a simple test case:

This works fine of course:

client = ENV['OPENAI_API_KEY'])
text = "Are you an AI?"
response = client.completions(
    parameters: {
        model: "text-davinci-001",
        prompt: text,
        max_tokens: 100

This fails with error when converting to JSON:

require 'zlib'
client = ENV['OPENAI_API_KEY'])
text = "Are you an AI?"
compressed_text = Zlib.deflate(text).force_encoding(text.encoding)
response = client.completions(
    parameters: {
        model: "text-davinci-001",
        prompt: compressed_text,
        max_tokens: 100


.../lib/active_support/core_ext/object/json.rb:39:in `to_json': source sequence is illegal/malformed utf-8 (JSON::GeneratorError)

I have tried many encodings and all fail to validate as JSON data.

Do you have working Python code in a small “hello world” test case for compression @mattg ?



Sorry but at the moment, all my workflow is running in platform. And I don’t find this part anymore so perhaps I had the same problem than you and let this inachieved. I am migrating this workflow to a on premise platform to save costs (n8n) so I will retry and get back to you if it works.

1 Like

In between, perhaps it would be possible to encode base64 the text to obtain a json compatible string ?

1 Like

Hi Matt,

Your logic is hard to follow and your replies are not helpful, sorry.

You made a strong, definitive statement in this topic for developers:

And I asked you provide the code, since I am a software developer and have tried on numerous occasions and cannot get a compression encoding to return a correct completion when sending it the OpenAI API.

If I send it compressed data in base64 format as a prompt to the API, the API has always returned completions with blah, blah nonsense.

You replied to me that you actually do not have working code and refer me to a web site selling furniture?

Either you have working code or you do not. Please back up your claim or please retract it.

Furthermore, I have searched the net, and I cannot find a reference where any developer has compressed OpenAI prompt data and sent that encoded data and compressed data to an API endpoint and received a valid completion, just sending compressed prompt data to an OpenAI API completion endpoint.

Then, you reply:

Sorry, this makes little sense. So, you have “no idea” if it works, and you just “made up” that you were sending compressed data to the API, but actually, you have no idea and just “make it up” in reply here?

@mattg, sorry again, but you either have a working algorithm to compress a OpenAI API prompt and send that compressed data to the API to return a valid completion, or you do not. You told us all, in very definitive terms, you can do this; but you provide no code and you keep defecting from providing a valid, testable reply.

Now, it seems you want me to come up with a solution for you which you claim you had?

So, I can only conclude based on your posts in this topic @mattg that you just “made up” the idea that you could compress data send it to the API and it would return a valid completion; because you cannot back it up with working code and you now are asking me to come up with a solution.



FYI: Google Searches Turn up Nothing.

Have tried many searches, none bear any fruit. Here is an example:

OBTW: Asking ChatGPT just returns chatbot hallucination nonsense.


Hi @ruby_coder,

You are perfectly true and I am so sorry.

I was honest when answering you but I had to manage many things and I simply “forget” that my many attempts to make this working were unsuccessful.

I retried again last night without more success but I am not a developper. (and not made, sorry for this too) is a workflow manager using APIs for many services or allowing to do JSON calls when the service is not directly managed by the platform. I only add little pieces of code to transform some data.
My last try was to add a urlEncode() after the zlib compression but I wasn’t able due to my poor knowledge in coding. Perhaps you could try but I am not sure we could save many tokens using this method.
So it possibly just not possible to do this and again, I am very sorry I made mistakes and let you think I was able to do this.



To answer to first topic question: my working solution to add context (without compression):

“model”: “gpt-3.5-turbo”,
“messages”: [{“role”: “system”, “content”: “Tu dois te comporter comme un ami sympa qui s’appelle MattGPT. Tu ne dois pas écrire ton nom au début de tes réponses. Tu dois tutoyer tes amis. Tu dois toujours répondre en HTML”},{“role”: “assistant”, “content”: “Précédents échanges : {{ $json.concatenated_text }}”},{“role”: “user”, “content”: “{{ $node[“Telegram”].parameter[“text”] }}”}]

Where $json.concatenated_text is a concatenation of last n google sheets row (created at each prompt and each reponse)

and $node[“Telegram”].parameter[“text”] is the message coming from telegram.You can urlencode it as well, chatGPT (at least with gpt-3.5-turbo) will understand.

Link to project topic MattGPT : multifunction telegram bot


Dang it I spent two days trying to get ChatGPT to produce this solution after reading that post. Oh well! I’m currently able to get my chatbot to remember it’s place in a conversation by appending prior responses to the messages using an array but the problem is it keeps running out of tokens after about 10 back and forths. Compression seemed like a good stopgap potentially but if that’s not a solution OpenAI presently supports I’ll have to try something else.

Thing is if it lost the token limit I and other developers would likely spend a lot more money because then we could deploy our apps into the wild for general public use! Is there a place to post that kind of feedback? More tokens = more money!


I would love to see if there is a way to use a lower model to summarize the previous message in a consistent manner such that using the other lower model would still reduce the cost compared with using the full length previous message… the summary could also be said by the assistant… I am curious about what the difference is when analyzing the array between what the assistant said vs what the user said… I didn’t experience with any of those things so far but I am curious to know what you guys think about that…


I think we can all agree @ruby_coder is de facto an official moderator here… Maybe he doesn’t know or maybe OpenAI doesn’t know but I think it’s obvious to everyone of us.

I love how he is promoting constructive discussion and how he can be rigorous, in his replies it is amazing to see how he brings something positive to our community!

It is refreshing to see someone so conscientious and uncompromising always ready to help others in a respectful manner…

In many other online communities a user would have been schooled on how they made up something but @ruby_coder made it feel methodical and diligent… He was polite and very methodical and made his point clear in a way that is engaging towards @mattg rather than repealing…

Well thanks fellow for making this forum such a great place to share our thoughts and knowledge…

Also thanks to @mattg for the clarification I hope someone will find a solution to avoid have 4000+ tokens per interactions and hypothetically even more with the possibility of getting perhaps 8000+ or 32 000+ tokens per interactions with the GPT-4 trend…

1 Like

I’m using javascript to both make the call and to append Assistant’s prior response to the messages so it can remember it’s place:

var previousResponse = [];

var chatbotprocessinput = function(textin){
  var userInput = document.getElementById("chatuserinput").value;
  document.getElementById("chatuserinput").value = "";
  document.getElementById("chatlog").innerHTML = "Loading...";

  // Construct the input message, including the previous response if it exists
  var messages = [
    {role: "assistant", content: "" },
    {role: "system", content: ""},
    {role: "user", content: previousResponse.concat(userInput).join("\n") },

    // Make the API request
  const url = ``;
  const options = {
    body: JSON.stringify({
      "model": "gpt-3.5-turbo-0301",
      "messages": messages,
      "temperature": 0.7,
      "max_tokens": 500,
    method: "POST",
    headers: {
      "content-type": "application/json",
      Authorization: "",

  fetch(url, options)
    .then(async (response) => {
      console.log(response); //If you want to check the full response
      if (response.ok) {
        const json = await response.json();
        const message = json.choices[0].message.content;
        console.log(json); //If you want to check the response as JSON
        document.getElementById("chatlog").innerHTML = message; //HERE'S THE CHATBOT'S RESPONSE

I should mention that the above would not be possible without me prompting ChatGPT to produce this script by giving it detailed instructions, and much trial and error as errors came in across the console. It sets up initial instructions for Assistant, and then the system, and then the user’s message is whatever is typed into the input. Next, it stores the responses in an array and then finally inputs those into the next round of messages. Depending on how long the instructions are will tell you how quickly you hit the 4,096 token limit because it will keep inputting all of the preset messages plus the prior responses in order to “remember” what was previously said.

Also, special thanks to LucianoSphere who’s own approaches were essential in developing my current working model:

when you say:

you need to write code to store messages and resend them (feed them back) to the API.

are you including them in the next prompt or only store them in the assistant message?

1 Like

Hi! How is everyone? I’m trying to NOT maintain context but I think the model by default is mantaining a bit of the context. Could it be possible?

1 Like

There is no context maintained by default. You need to write a truncation algorithm yourself (like we do in ChatGPT) that makes the decision about what information should be passed back into the context from previous messages. There is a ton of options here and no correct way to do it, so I encourage you all to try various solutions.


That sounds very interesting. Could you give an example how such a solution would look like?

Furthermore, my main concern is not to consume new tokens for keeping the context. Is that possible? If not, how I could I keep it as low as possible?

Many thanks for your support in advance!

Idk :woman_shrugging: my success with 3.5 has evolved into sending only the last API response concatenated with a row of a external text file search of keywords of related information to my subject bot.

What KENNEDY showed me is the order of the array which is sent to the API is significant.

I currently sent messages to the system and user inputs of the API

The System gets its own previous response concatenated with what was found key word searching and identifying one row of data that is hopefully relevant to what user is asking

If they match 90% now of the API responses are right on target and takes up the personality intended

The order i mentioned of the array of the API call matters. The Most recent response and the factual relevant information must be at the top of the array or it won’t work and the API will assume it’s normal behavior

I tested KENNEDY ideas assuming the PHP call flipping the roles around and broke my bot from the success I was having

I flipped them back and my bot recovered and became the personality I am training him for

The idea of a simple document being force fed every call doesn’t work and breaks after three or four messages

Feeding too much diverse information breaks it and sometimes refuses to answer at all

It has to be short and to the point of what the user is interested in