How does max_prompt_tokens work?

_AIIS · September 10, 2024, 3:26pm

max_token_prompt was very confusing to me at first, but keep digging in, you can have my custom gpt teach you about it more at link below. But from my understanding, this is the maximum number of tokens that will be sent as your prompt and includes your current prompt, the past messages, and the overhead like instructions and behind the scenes tokens. So if you set to say 50k for example, and your current thread is already at 200k (say you’ve been having a long convo), you obviously can’t send 200k tokens in a prompt, so it has to truncate that down using the truncation strategy (default is auto) small enough so that your current prompt tokens and all the other helper tokens, plus all the info from the past messages (truncated) is no more than 50k tokens. There are different truncation strategies you can use to “summarize” all the previous messages to be sent along with your current prompt to the model. Hope that helps with the concept a little. And max_completion_tokens is the maximum the model will respond with, fyi.

Topic		Replies	Views
Assistant API v2: max_prompt_tokens gets exceeded, barely, consistently Bugs	5	925	July 4, 2024
Max_tokens seems to do nothing for me 3.5 Turbo API	14	3311	December 18, 2023
Clarification for max_tokens API codex	10	100770	December 12, 2023
I need help using openai API API chatgpt , gpt-4o-mini	2	228	October 29, 2024
Doubt on prompt tokens and completion tokens API api	2	1252	April 18, 2024

How does max_prompt_tokens work?

Related topics