How to deal with this situation? A large number of spaces and line breaks are returned, which causes the length to exceed the limit.
- Use
max_completion_tokens
to limit your cost to the maximum ever expected from normal responses - Use Chat Completion’s
frequency_penalty
with a small positive value, so the bad tokens become more discouraged the more of the same is written. - Completely and exhaustively describe any JSON output format you ask for, especially when using a non-strict schema.
- Switch models to one where the symptom can’t easily be reproduced by others also (aka my detector says to get off gpt-4o-mini).
We use o3-mini, max_completion_token is 50,000, because some questions and answers require long tokens. The frequency_penalty is currently 0. So I will set the frequency_penalty to 0.7
o3-mini not support this request param frequency_penalty
,then what should I do
Or are there any recommended prompt tips to reduce this problem?