How the reasoning_tokens were calculated?

jayendran · January 28, 2025, 12:54pm

I’m using o1-mini model and would like to know how the reasoning_tokens were calculated

I couldn’t find any reference https://platform.openai.com/tokenizer for o1 or o1-mini models

As per this

Depending on the problem’s complexity, the models may generate anywhere from a few hundred to tens of thousands of reasoning tokens. The exact number of reasoning tokens used is visible in the usage object of the chat completion response object, under completion_tokens_details

The problem is for streaming APIs currently not supported withstream_options → include_usage param. When i tried i get the error like

    "error": {
        "message": "Unsupported value: 'stream' does not support true with this model. Only the default (false) value is supported.",
        "type": "invalid_request_error",
        "param": "stream",
        "code": "unsupported_value"
    }
}

So the only way i can calculate the reasoning_token with own custom logic rather not depend on openai usage block which may not work for streaming APIs.

_j · January 28, 2025, 1:28pm

2024-09 o1-mini and o1-preview do not support streaming at all. Nor is any “thinking” given to you.

2024-12 o1 will have streaming, but it is not released yet, and the feature set is unknown.

You have to wait for the total final response, which will have the usage. The cost of reasoning is included in completion_tokens, and there’s a separate details section that says the portion of cost that was the internal turn generation.

Basically, imagine the AI has several prompts, several fine-tune models it can switch between for task purposes. Many possible stages. The AI satisfies one of these prompts with a response. Then tool-like action gives the next step for the AI to generate, growing an internal conversation with itself. All proprietary.

If you are the user, you can imagine the internal thinking and cost of “Hi!” versus “Write a python application that is a web spreadsheet” will diverge, where the first is the AI reflecting to itself “I don’t need to think hard, and saying hello is a permitted response, so I’ll go directly to user output”.

If you are offering the model as a service, you have no way of predicting the usage if the users want to have a cat haiku or to solve a mathematics proof. You can only bill. And OpenAI keeps changing the model and consumption on you anyway.

Topic		Replies	Views
Am I begin overcharged for o1-mini? API o1-mini	5	546	September 30, 2024
How to check the price of o3-mini models while using streaming API? API	1	82	May 3, 2025
Token count for completion call? API	6	2253	December 19, 2023
Discrepancy in Token Counts Between tiktoken and API Usage for o4-mini/gpt-4o-mini Bugs api	1	258	May 28, 2025
Single word "code" response consumes 199 tokens using 4o-mini API o4-mini	5	104	July 8, 2025

How the reasoning_tokens were calculated?

Related topics