I am using the gpt-5-nano model for a step in a chatbot to determine a user’s intention with their latest comment. There are about 15 categories that I am trying to identify for now, and I’ll reduce the number as I see which have most value.
So I have a reasonably large number of input tokens, which is fine, but what I don’t understand is the very high output tokens count that I am getting, usually 800-1000. For example, this response was 1057 tokens. It’s nearly as high when I ask for only the code without the reasoning. If I set max_completion_tokens any lower then it just ends up failing with a length error. Any idea what’s up?
{“code”: “USER-COMPLIMENTS-ASSISTANT”, “reason”: “User said the assistant’s joke was very funny.”}
Thanks!
Steve