Within the ChatGPT app, the max_token response length has been discovered, by experimentation and bugs. It is 1536 tokens. 150% of 1024.
After hitting the limit, it has a clever trick, the “continue” button. GPT-4 AI especially has now been tuned to make answers of limited size though, so the button’s appearance is rarer.
advanced topic warning
In ChatGPT, there can’t be an active or live display of what past conversation will actually be used. ChatGPT’s backend currently uses a similarity-matching technique on the submitted input to retrieve just a handful of past conversation turns from the database. This sometimes can be more user messages than AI replies, and they can be a salt-and-pepper smattering of just snippets of what was discussed at length, some exchanges summarized. Learning how it works is the uncertainty principle: you can’t probe the conversation management without your jailbreak affecting it.
ChatGPT’s technique would require processing of the sent user input with an embeddings vector database match, part of why we now see a delay before token generation (along with secret output generation monitoring to examine for content violations beyond what the AI is trained to deny).
More unseen input tokens would mean more processing load, which is what you pay for on API. Making the chatbot software more forgetful saves computation resources.
Your own app (or mine) can do a live token counting and parameter consideration to adapt and show what chat would be sent and gray-out older messages. Token counting requires a 2MB dictionary download and processor-intensive library. Also, accurate live display as you type would preclude embeddings or lookups using submitted input to retrieve relevant conversation even older (although you could click and manually disable or force history message also, a GUI for only an advanced user.)