Why are completion tokens so high?

stevecalnek · February 28, 2026, 1:55pm

I am using the gpt-5-nano model for a step in a chatbot to determine a user’s intention with their latest comment. There are about 15 categories that I am trying to identify for now, and I’ll reduce the number as I see which have most value.

So I have a reasonably large number of input tokens, which is fine, but what I don’t understand is the very high output tokens count that I am getting, usually 800-1000. For example, this response was 1057 tokens. It’s nearly as high when I ask for only the code without the reasoning. If I set max_completion_tokens any lower then it just ends up failing with a length error. Any idea what’s up?

{“code”: “USER-COMPLIMENTS-ASSISTANT”, “reason”: “User said the assistant’s joke was very funny.”}

Thanks!

Steve

_j · February 28, 2026, 2:16pm

gpt-5 models are reasoning models. You pay for their internal thinking as output.
gpt-5-nano thinks excessively long for poor results. Better to just use mini.
use the API parameter “reasoning_effort”, and set it to “low”. That will indicate to the model how much to think (the parameter for Chat Completions).
Or simply use gpt-4.1, which goes right to producing output without first deliberating about and valuing which “code” to generate. Use a “top_p”: 0.01 if you want consistent answers instead of random ones.

merefield · February 28, 2026, 4:50pm

Why use reasoning at all for this? It sounds like a classical categorisation task that you could evaluate with a basic LLM without any internal “loops”.

I second @_j suggestion of using 4.1

stevecalnek · March 2, 2026, 12:16pm

interesting answer. Which of the models? 4.1, 4.1-mini, 4.1-nano, do you think is best. I’ll try the top_p thing too

Topic		Replies	Views
GPT 5 100x token usage compared to GPT 4.1 API api , token , gpt-5	5	3906	August 18, 2025
What is going on with the GPT-5 API? API	40	18071	October 21, 2025
Switching APIs from 4.1 mini to 5.1 mini - a lot more tokens generated API gpt-4 , api , gpt-51	6	169	March 10, 2026
O4-mini returns empty response because reasoning token used all the completion token API o4-mini	2	427	September 19, 2025
Discrepancy in Token Counts Between tiktoken and API Usage for o4-mini/gpt-4o-mini Bugs api	1	387	May 28, 2025

Why are completion tokens so high?

Related topics