Thank you for sharing the solution @jlvanhulst!
There is still the issue of being charged for all those redundant \n
and \t\t
output tokens being generated, correct?
Also, when using the Chat Completions API, would setting the frequency_penalty
to some positive number (as default is 0) help address the cost problem by truncating the response early, as in theory, every new line of \n
or
\t\t
would be penalized? What do you think?
The alternative for the Responses API could perhaps be setting the max_output_tokens
value to a conservative number that’s well below the maximum limit for the model you’re using, depending on the use case. Maybe that could work as well?