But, this only has a limit of 4096 characters. Some answers can not be shortened and having a full own conversation back and forward will be more than 4096 characters.
Is there truly no other way?
The only way to do this is to simply add previous responses from chatgpt and after 4096 it’s game over?
Thank you.
Your understanding that you must include relevant chat history in every API call is correct. However, as for the input you can include you have much more flexibility.
4096 is the maximum number of output tokens that can be returned in an API call. As for the input tokens, you have a much larger count available, depending on the model you choose. With gpt-4-turbo you can include over 120,000 tokens.
Here in the model overview, you can see the so-called context window for each model. The sum of input and output tokens must stay within this limit.
The 4096 token limit is on output. The model gpt-4-turbo has a context length (input + output) of 128K tokens.
If you ever get the model to generate a response where it reaches the max output limit, you’ll get a finish_reason with the value length in the response object. In that case, you can simply append the partial assistant message you received to the existing messages list and have the model continue from where it left off.