Annoying error if max_completion_tokens is "too high"

andrew222651 · February 3, 2025, 12:00am

If we set max_completion_tokens to something greater than the max context length of the model minus the prompt length, we get

message: "This model's maximum context length is 4097 tokens, however you requested 5360 tokens (1360 in your prompt; 4000 for the completion). Please reduce your prompt; or completion length.",

It would be really helpful if the API could just set the maximum completion tokens to min(max_completion_tokens, model’s max context length - prompt length) rather than return an error. The parameter max_completion_tokens is only supposed to be a maximum, not the actual number of tokens requested.

_j · February 3, 2025, 1:27am

The max_completion_tokens (and prior max_tokens parameter) also serves on non-reasoning models to act as a “reservation” of output space by the way the rate limiter works.

When specified, the rate limiter calculates requested maximum against both your API rate limit, and also the maximum that remains as input token space in the model context window.

These two functions could certainly be separated for more utility (and more confusion), by setting a “minimum output reservation” along with a “maximum output cutoff”, but this wasn’t done even with the new max_completion_tokens. Your own API code can apply the same logic, though.

This does have utility though: if you request 2000 tokens of response, you won’t be able to send 4000 tokens of input to a model only supporting a 4097 context window - you’d only have 97 tokens of unsatisfying response that can be produced left before termination.

Use gpt-4o-2024-11-20 as your AI model, and your context window is bumped to 125k, pretty much solving the concern.

For the current 4k model you show, setting a maximum near the maximum has no cost safety utility – just don’t send the max_completions_token parameter, and you get the arbitrary use you desire.

Topic		Replies	Views
Max tokens chat completion gpt4o API gpt-4o	4	18349	September 5, 2024
API token limitation differs from website UI token limitation API	4	623	December 18, 2023
Error Encountered When Using max_tokens Parameter with GPT-4 API API gpt-4 , api	5	2916	December 19, 2023
API \| Max Token Error \| Tier 4 \| Fluctuating between 128000 and 4096 Bugs api	3	3607	November 30, 2023
Problem with context token for gpt-3.5-turbo-0125 Community chatgpt	1	256	June 17, 2024

Annoying error if max_completion_tokens is "too high"

Related topics