How the reasoning_tokens were calculated?

I’m using o1-mini model and would like to know how the reasoning_tokens were calculated

I couldn’t find any reference https://platform.openai.com/tokenizer for o1 or o1-mini models

As per this

Depending on the problem’s complexity, the models may generate anywhere from a few hundred to tens of thousands of reasoning tokens. The exact number of reasoning tokens used is visible in the usage object of the chat completion response object, under completion_tokens_details

The problem is for streaming APIs currently not supported withstream_optionsinclude_usage param. When i tried i get the error like

    "error": {
        "message": "Unsupported value: 'stream' does not support true with this model. Only the default (false) value is supported.",
        "type": "invalid_request_error",
        "param": "stream",
        "code": "unsupported_value"
    }
}

So the only way i can calculate the reasoning_token with own custom logic rather not depend on openai usage block which may not work for streaming APIs.

2024-09 o1-mini and o1-preview do not support streaming at all. Nor is any “thinking” given to you.

2024-12 o1 will have streaming, but it is not released yet, and the feature set is unknown.

You have to wait for the total final response, which will have the usage. The cost of reasoning is included in completion_tokens, and there’s a separate details section that says the portion of cost that was the internal turn generation.


Basically, imagine the AI has several prompts, several fine-tune models it can switch between for task purposes. Many possible stages. The AI satisfies one of these prompts with a response. Then tool-like action gives the next step for the AI to generate, growing an internal conversation with itself. All proprietary.

If you are the user, you can imagine the internal thinking and cost of “Hi!” versus “Write a python application that is a web spreadsheet” will diverge, where the first is the AI reflecting to itself “I don’t need to think hard, and saying hello is a permitted response, so I’ll go directly to user output”.

If you are offering the model as a service, you have no way of predicting the usage if the users want to have a cat haiku or to solve a mathematics proof. You can only bill. And OpenAI keeps changing the model and consumption on you anyway.

2 Likes