Limiting maximum number of reasoning tokens

Hello,

Is there a way to limit the number of reasoning tokens besides setting reasoning_effort == ‘low’?

Yes, you can limit it with the parameter max_output_tokens (reasoning is part of the output).

Controlling costs

If you’re managing context manually across model turns, you can discard older reasoning items unless you’re responding to a function call, in which case you must include all reasoning items between the function call and the last user message.

To manage costs with reasoning models, you can limit the total number of tokens the model generates (including both reasoning and final output tokens) by using the max_output_tokens parameter.

1 Like

Use of max_output_tokens (aka max_completion_tokens on Chat Completions) will truncate and stop the output. It will not affect the AI’s generation up to that point. That might mean that you spend 4000 tokens on internal reasoning and the generation is terminated before you ever see any output.

Thus, it can prevent runaway AI going in loops, but will not make for lower-budget answers. You just have your prompting and the difficulty of answering that in conjunction with the reasoning effort parameter.

3 Likes