Limiting maximum number of reasoning tokens

Yes, you can limit it with the parameter max_output_tokens (reasoning is part of the output).

Controlling costs

If you’re managing context manually across model turns, you can discard older reasoning items unless you’re responding to a function call, in which case you must include all reasoning items between the function call and the last user message.

To manage costs with reasoning models, you can limit the total number of tokens the model generates (including both reasoning and final output tokens) by using the max_output_tokens parameter.

1 Like