How to keep session with gpt-3.5-turbo api?

I just want to use gpt 3.5 turbo API to do conversation as I do in ChatGPT. But there seems no easy way to keep session with API.
I know this is an old question, but I don’t find a good answer for it.
I searched related topics in this forum, and it seems no way to continue a conversation in completion API itself, such as sending a session ID as a parameter. The alternative ways is giving previous conversation or summary of previous context.
And when I ask ChatGPT how it holds the session, it answers that it uses ‘attention mechanism’ to focus on specific parts of the input sequence and also uses a “memory” to store important information from the conversation history.
So I want to know that is there any easy way to do session with gpt 3.5 turbo API? Or will OpenAI somehow provide additional parameter to support session?


Hi @ylc

Welcome to the OpenAI community.

You’ll have to implement session on your end by passing the existing conversation to the chat completion end point.

How you store conversation on your end depends on what’s convenient for you.

Here’s a very basic implementation

1 Like

As the OP, I also wish to see some form of conversation_id and parent_id parameters, which correspond to the state of the last prompt and the state of the conversation at any point. In the web UI we can recall a conversation session, only adding new prompts to already existing context and even branch of from a previous prompt into a new timeline. This is already achieved with gpt-3.5-turbo in the context of ChatGPT. If we could get that feature through the API for gpt-3.5-turbo, we could avoid resending the previous messages with every prompt.


Yes, because OpenAI developed an retail web-based ChatGPT application and that app feedbacks the prompts to provide prior messages to the model.

Developers using the OpenAPI are expected to write their own code to do this when they develop their own application.




I cannot agree to that. Do you have a reference? I have in some cases 20+ long prior prompts setting specific context regarding the manual, configuration, code of an application and when I make new prompt i get instant reply. From my experience whole history is not being resent every time.


You can search these forums.

This has been discussed many times here.

You @nikko joined our community 11 hours ago. It is a good idea to search the site and review the prior discussions before rejecting replies here.

Welcome to our community, but please search the site before asking questions.



I did not say “the whole history is resent every time”. Those are your words, not mine, @nikko

I said:

Of course there are pruning and summarization strategies, all which are the responsibility of the developer to create; and which have also been discussed here in many posts.

That is all a part of the developers responsibility to manage this. OpenAI provides an API. Developers use the API to create a full application.

I kindly recommend you search these forums first.



Example Prior Topic(s)

There are more, can search using magnifying class in upper right hand corner


Thank you for the details and the references you provided @ruby_coder. I didn’t think of this as pruning and summarization, but as some light form of transfer learning / fine-tuning which adds up a small overhead to the original model.


Yeah, that’s not what happens, sorry. There is no “transfer learning” and there is no “fine-tuning on the fly” which happens the current series of these OpenAI pre-trained large language models.

Just think of these models are what they are. They are powerful text auto-completion engines. They take in your input and predict the next sequence of text. These OpenAI LLMs do not “learn on the fly” at all, but many people mistakenly imagine they do.

Also, depending on the temperature selected, these LLMs generate random output, so some people get random text which happens to match a prior chat session, and they then mistakenly think that random text which coincidentally matches something from the past is “proof of learning” but it’s just a random coincidence.

Hope this helps.



@sps this would be a good approach if chat-gpt has no limitation on text amount you send at a time. Try to send a text of three pages and chat-gpt will refuse to interact. There must be another way to keep the session alive.


Please read:

1 Like

As @mustafa.salahuldin says, there MUST be another way to keep the session alive. I would have expected the API for ChatGPT to work exactly like we use the chatgpt web app. For instance, python API could work like this:

import openai

session = openai.ChatCompletion.Session(model="gpt-3.5-turbo", title='example session')
chat_id = session.chat_id

      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Who won the world series in 2020?"},
      {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
      {"role": "user", "content": "Where was it played?"}

response = session.appendMessages(messages)

# do something
        {"role": "user", "content": "Say it again, but in spanish"}

response = session.appendMessages(userMessage)

If we want to continue the conversation later, we could just:

session = openai.ChatCompletion.Session(model="gpt-3.5-turbo", chat_id=chat_id)
response = session.appendMessages(userMessage)

Hi @gone

You are welcome to the community.

Please don’t write hypothetical code that will not run.

Instead read the docs on chat completion API to know how it works, which says in Managing Tokens section:

If a conversation has too many tokens to fit within a model’s maximum limit (e.g., more than 4096 tokens for gpt-3.5-turbo ), you will have to truncate, omit, or otherwise shrink your text until it fits. Beware that if a message is removed from the messages input, the model will lose all knowledge of it.

Note how it says YOU will have to truncate, omit, or otherwise shrink your text until it fits

Hope this helps.


Hi @gone

If there was any part of my message that was unwelcome, please let me know so that I can address it. I want to ensure that our communication is always respectful and productive.

I would like to take this opportunity to welcome you to our community.

1 Like

I can understand the misconception that the API should work like the public chat app, but that is an app built on GPT, it’s an example.

What people fail to understand is that storing session data would incur a cost and require additional infrastructure. I could only imagine this functionality as a paid service. And while I could see openAI provide something like this, it does fall outside of its current domain.

If you are here to make an application, you need to understand you are building upon GPT, not ChatGPT. ChatGPT is the example application.

As a developer, you will be required to handle the API and navigate your interactions with the current restrictions in place. And though it definitely requires work and knowledge, I prefer the abstraction.

Good luck.


I am paying to use the chat API ( so yeah I and many other clients expect a chat API to have some way of keeping some sort of session that retains certain amount of the last messages in memory, because to create a chat product, context is important, and yes, none of us want to hear the solution is to re-send the last X messages in EVERY new query and paying a lot more money, it is inefficient in many ways, the OpenAI could keep that cache themselves associated to a new chat_id parameter, while not giving a clunky API without even explaining what the returning fields in the response mean…


I’m pretty sure that if there were a session that saved your context, the context would still have to be fed into the model and processed. Your only savings would be the 1s and 0s you saved transferring them from your app to the API. It still has to process the context in relation to the prompt, and that processing is where the cost comes in.

While it’s true that the API reference doesn’t explicitly state exactly what each returned value means, the keys for the values are quite descriptive.

To be fair, I’m only about 20 hours into this journey myself. I built a very cost-per-token expensive version of the chatbot that used the regular completions api before I found the chat completions. I’d built a concatenation history (only learning it was a thing and had that name after completing it) for delivering context, then a summary routine when I realized that would quickly grow out of hand. The chat completions api makes this even easier as I just have to slice dictionaries out of my messages array, and is way cheaper to boot.


I will certainly agree that the documentation and even the labeling of functionality is a bit off for the API. I too was a bit confused on why it wasn’t working like ChatGPT at first.

But, once you understand how it works, it makes much more sense. It’s a completions engine, and it requires context to do so. Placing this under the heading of “chat” in the API docs is a bit misleading. And that’s on OpenAI.

I also found (as a paying API developer) that even very large contextual exchanges are not very cost-prohibitive.

Yes, cache-like functionality could be provided by OpenAI. But, as I mentioned previously, it would incurr a cost, which of course the developer will pay for. Perhaps they could consider this service as another endpoint, but I’m doubtful. It’s not their business model. And booting up a redis instance as a history cache isn’t particularlly complicated. Ask ChatGPT how to set it up, and it will probably tell you.

Being a full stack developer, I can understand the frustration of it not working the way one would hope, but like most development, it’s probably more complicated than one might think. Yes, you will need some minimal amount of coding skills to use the API. It’s not IaaS or SaaS.

And to be honest, as a developer I prefer having control over that data. To echo what @JollyCynic said, it allows for much more flexibility.

It’s an API. It’s puzzle pieces you put together. Abstraction is something that has many benefits, and I am glad this is an option, though it should be more clearly explained in the documentation of what one should expect when building with the endpoint.


Codex api虽然能到4K但是在playground里面效果有点糟糕。
GPT4api 上了之后能支持8k和32k,但应该价格也会贵不少。

When considering session and conversation history I’ve taken to prefixing each new Prompt thus:

‘On the topic of: ‘&<Natural Language Processing> &’ analyse the Prompt using NLP and return topic, context, named entities, keywords and sentiment and then respond to the Follow Up question :’& <Last_Response> &<FollowUp>

In this case the previous Prompt was: what does named entities and sentiment mean?

Note: Emboldened terms above are variables inserted under script control.
The italicised blocks below are the Last Response.

In this case the returned Response was :
Topic: Natural Language Processing
Context: Analyzing the prompt using NLP
Named entities: None mentioned in the prompt
Keywords: Natural Language Processing, NLP, analyze, prompt, named entities, sentiment
Sentiment: None mentioned in the prompt

----- Named entities refer to specific entities that are mentioned in a piece of text, such as names of people, organizations, locations, or products. In NLP, named entity recognition is the process of identifying and classifying these named entities in text.

----- Sentiment analysis is a technique used in NLP to determine the emotional tone or attitude of a piece of text. It involves analyzing the words and phrases used in the text to determine whether the overall sentiment is positive, negative, or neutral.

Follow up: What is the purpose of sentiment analysis in NLP?

The purpose of sentiment analysis in NLP is to determine the emotional tone or attitude of a piece of text. This information can be used for a variety of purposes, such as understanding customer feedback, monitoring brand reputation, or analyzing public opinion on a particular topic. Sentiment analysis can also be used to improve the accuracy of other NLP tasks, such as text classification or information retrieval.

(Source: gpt-3.5-turbo Temperature: 0.7)

Each Prompt and Response are absorbed into a Pivot Table. With the idea of building a history of NLP terms in each interaction both Prompt and Response are subjected to analysis by Google Cloud Natural Language using its Api. The idea, clearly, is to be able to pick up where the chat left off when returning to a topic by prefixing each new prompt with an NLP synopsis of the chat so far. The Pivot Table presently holds 516 interactions across 24 topics.


1 Like