Do 'MAX tokens' include the follow up prompts and completion in a single chat session

There are two things that might be conflated here:

A model’s context length is the total amount of tokens it can handle at once, a combined count of both the total input that you send and the response you get back.

An API call’s max_tokens parameter reserves a specific amount of the context length for forming an answer, and sets the size of the maximum response that you will receive back from the AI.


“Followup responses” to me means that you have a chatbot application (instead of just making individual requests for single data processing tasks).

In a chatbot scenario, the software you use or write should also include some of the past conversation as role messages, before finally including the most recent user query, so that the API calls (which are unconnected and without history) can make the AI understand what you were talking about.

This growing conversation means that you keep sending more input to the AI model with each question, until ultimately you must manage and truncate the past conversation. If you send too much input, besides paying a lot, you can hit a limit where the input plus the max_token space you reserved for an answer is larger than the context length, and you instead get an error.

2 Likes