How Does the ChatGPT Web Application Retain Context Information to Achieve Infinite Conversations?

I have recently been exploring the capabilities of the GPT-3.5 Turbo API, and I came across an intriguing question regarding the maximum token limit of 4096. It seems that in order to implement multi-turn conversations, one needs to include previous dialogue in the prompt.

However, I noticed that the ChatGPT web application allows for seemingly infinite conversational turns while retaining context information. How is this achieved?


My guess is they are using some sort of summarizing techniques to squish the large amount of previous context into something smaller so that it is well within the GPT’s input token limit. Guess we can never be sure since openAI has not released any information on how they are dealing with this.