I need help using openai API

I am using openai API with Python, and the model is gpt-4o-mini.

What I need help with is about “max_tokens”.

  1. When designing the max_tokens variable, do I only need to think about Completion Tokens? Or should I design it by adding Prompt Tokens?

  2. (Completion Tokens: 1098, Prompt Tokens: 6852, Total Tokens: 7950) came out, like number 1, I didn’t know the exact answer, so I tested with 8000, 10000, 13000, and 16384, but the AI’s answer changed. Is this part affected by max_tokens?

These links may help.
OpenAI Help Center — What is the difference between prompt tokens and completion tokens? | OpenAI Help Center

help.openai.com

favicons.png

OpenAI Developer Forum — Clarification for max_tokens - API - OpenAI Developer Forum

community.openai.com

favicons.png

OpenAI Developer Forum — Do ‘MAX tokens’ include the follow up prompts and completion in a single chat session - API - OpenAI Developer Forum

community.openai.com

favicons.png

OpenAI Developer Forum — Confused about max_tokens - parameter with GTP4-turbo (128k-tokenUsedForPrompt or 4K) - API - OpenAI Developer Forum

community.openai.com

favicons.png

OpenAI Developer Forum — How the max tokens are considered - API - OpenAI Developer Forum

community.openai.com

1 Like

The existing parameter max_tokens can still be used with all but o1-preview and o1-mini models, but it also has a new name, long-coming, that explains its purpose better:

max_completion_tokens

It is the maximum you want to pay for generated language tokens before the output is shut off.

The distinction in “want to pay” arises because, with the new o1 model, you are also charged for tokens processed internally, even if they don’t appear in the output. Previously, and with all other models, charges only applied to the length of the output generated before language generation was terminated.

The AI’s response varies with each new inference due to built-in statistical randomness. max_completion_tokens does not impact the language quality—except in cases where output may be truncated if the AI reaches the token limit before completing its response.

There is a different calculation per-input to think of: this parameter also acts as a “reservation” of output space in the model’s context length - if you send more input and also set max_tokens for more total context than the model supports for its context window length (the memory where all token operations happen), you will get an error.

So:

  1. It simply sets the maximum output size.
  2. It doesn’t affect the answer.
  3. It should be set high enough to always get complete answers (3000 is good)
  4. It should be set MUCH higher on ‘o1’ models, as the potential cost - and the cost of not getting output - is higher (25000 is good, or just omit)
2 Likes