Do 'MAX tokens' include the follow up prompts and completion in a single chat session

_j · August 24, 2023, 11:18pm

There are two things that might be conflated here:

A model’s context length is the total amount of tokens it can handle at once, a combined count of both the total input that you send and the response you get back.

An API call’s max_tokens parameter reserves a specific amount of the context length for forming an answer, and sets the size of the maximum response that you will receive back from the AI.

“Followup responses” to me means that you have a chatbot application (instead of just making individual requests for single data processing tasks).

In a chatbot scenario, the software you use or write should also include some of the past conversation as role messages, before finally including the most recent user query, so that the API calls (which are unconnected and without history) can make the AI understand what you were talking about.

This growing conversation means that you keep sending more input to the AI model with each question, until ultimately you must manage and truncate the past conversation. If you send too much input, besides paying a lot, you can hit a limit where the input plus the max_token space you reserved for an answer is larger than the context length, and you instead get an error.

Topic		Replies	Views
I wish that when using the GPT API, it would be possible to have a contextual conversation like chatGPT API	14	7082	December 18, 2023
Short-Term Memory in Season Solutions? Prompting gpt-4 , memory-issues	9	1026	December 21, 2023
Need more than a 4097 token call from chat gpt api API	7	3188	November 28, 2023
How does ChatGPT have such massive token limit? API	12	32078	December 12, 2023
Feature Request: Token Adaptive Model API chatgpt , api	25	2066	August 8, 2023

Do 'MAX tokens' include the follow up prompts and completion in a single chat session

Related topics