Question about o3-mini token counts and thinking tokens in general

I’m trying to understand the differences between the various thinking modes (low, medium, high) in the o3-mini model. My understanding is that the primary distinction between o3-mini-low and o3-mini-high lies in the number of “thinking tokens” generated internally before the model produces its final response.

This leads me to a few questions:

  • Quantifying Thinking Tokens: Is there any information available about how many additional thinking tokens are generated in each mode (low, medium, high) relative to one another?
  • Value of Thinking Tokens Post-Response: Are these internally generated thinking tokens still valuable to the model after it has generated a response? For example, in multi-turn conversations, would the model benefit from having access to these previously generated thinking tokens?
  • Potential for Developer Access: Has OpenAI considered allowing developers to access these hidden thinking tokens? Could developers potentially feed these tokens back to the model as input in subsequent turns, perhaps using a unique ID to identify each set of thinking tokens? This could provide valuable context and potentially improve the model’s performance in extended conversations.

Essentially, I’m curious about the scale and potential utility of these thinking tokens, and whether there’s a possibility for developers to leverage them to enhance the model’s conversational capabilities.

Rewritten by AI for clarity.

One of the first internal branches is the AI deciding how much planning is actually required. A simple request can be of lower cost based on the task complexity.

Although it is not explained or revealed, one characteristic of “o1” and “o3” in a benchmark form is that it can produce a token cost much higher than model context. Thus, one of the “high” reasoning strategies may be continued delegated and judged best-of trials, rather than simply building a context with an internal chat.

This one I can answer to cut you off quickly - on the API, you simply do not receive the full generated context and cannot send it back to the model in follow-up conversational calls. Thus, you cannot provide the previous reasoning as something to be seen and evaluated again.

If the reasoning went on for 10 minutes with more agentic internal branches and best-of evaluations, it may not be something that can be understood on the whole.

The reasoning must start anew for each API request, starting again with the steps that can be “does this follow OpenAI policies…” on the full context.

OpenAI is protective of reasoning prompt and generation as intellectual property. The “consideration” seen is placing a new prompt denial mechanism for even asking about reasoning, which has also resulted in account bans.

“Conversational capabilities”? This is more about problem-solving.

It might be easiest to rephrase “reasoning_effort” to “determination to find an answer”, or “cap on cost of perseverance” to get a better picture. Then, simply try out the three choices on your challenging task at hand, and graph quality vs cost.

I think you are not understanding my question.

OpenAI has the thinking tokens. I am not asking for them to be giving them to me. They could just though give some type of message ID for the thinking message context and then allow to be sending this thinking message ID as a context.

This way, me the user cannot be seeing the thinking tokens but they could still be reused.