How does max_prompt_tokens work?

max_token_prompt was very confusing to me at first, but keep digging in, you can have my custom gpt teach you about it more at link below. But from my understanding, this is the maximum number of tokens that will be sent as your prompt and includes your current prompt, the past messages, and the overhead like instructions and behind the scenes tokens. So if you set to say 50k for example, and your current thread is already at 200k (say you’ve been having a long convo), you obviously can’t send 200k tokens in a prompt, so it has to truncate that down using the truncation strategy (default is auto) small enough so that your current prompt tokens and all the other helper tokens, plus all the info from the past messages (truncated) is no more than 50k tokens. There are different truncation strategies you can use to “summarize” all the previous messages to be sent along with your current prompt to the model. Hope that helps with the concept a little. And max_completion_tokens is the maximum the model will respond with, fyi.