I have around 20 messages and when I tried to generate a new one, I encountered the error “this model maximum context length is 4096 tokens. However, your messages resulted in 4098 tokens.”
It’s kind of frustrating because I thought the CHAT-GPT API accepted unlimited messages, but apparently it doesn’t. How can my application start to accept unlimited messages or something like that? Without losing the context of the conversation or the CHAT-GPT remembering the first message sent?
Hi @suportetynitemail
Hope you are well. Sorry to answer so directly and hope you are not offended.
The OpenAI API docs clearly state that the models used in the chat
completion endpoint are limited to 4096 tokens.
I know this sounds “too strong” when I say this, sorry, but when the chat
completion endpoint was released it took me less than 5 minutes to read the new API reference docs on chat
when I woke up, and my morning hot coffee was still warm and refreshing to drink after finishing reading them.
It saves a lot of time and energy when we review these docs first and so we will never feel like, as you posted:
It’s easier, faster and less frustrating to just go to the horse’s mouth from the beginning.

Sorry, I’m Brazilian and not fluent in English and, therefore, some information may seem distorted in the page’s automatic translation. (OpenAI Docs)
But, getting back to my question, is there any way to circumvent this limit? Because, if so, it’s a replica of GPT-3. I used to do chatbot with GPT-3 and, to circumvent the limit, I just took the last 10 messages sent by the user. But, in CHAT-GPT, will I also need to do this?
There is no way to “get around the limit” but you can manage your token count just as you describe with GPT-3:
You must manage token count with the chat
completion as well:
Yes, you need to manage tokens.

1 Like