Discrepancy in Token Counts Between tiktoken and API Usage for o4-mini/gpt-4o-mini

I’m encountering an issue with token counting in the o4-mini and gpt-4o-mini models, where the token counts reported in the API’s usage field significantly exceed those calculated using the tiktoken library. Here’s a detailed description:

Issue Description

For a simple test case:

  • System Prompt: “Finish the given task”
  • User Prompt: “Say hello world”
  • Output: “Hello, world!”

Using the tiktoken library with the o200k_base encoding, I calculated:

  • System prompt: 4 tokens
  • User prompt: 3 tokens
  • Output: 4 tokens

However, the API’s usage field reports:

  • prompt_tokens: 17
  • completion_tokens: 214
  • reasoning_tokens: 192
  • total_tokens: 231

Does it mean that:
“Hello, world!” cost 22 tokens (214 completion_tokens - 192 reasoning_tokens)? But tiktoken shows that it only includes 4 tokens.

This suggests a significant discrepancy, with prompt_tokens and completion_tokens higher than expected (about 20 tokens).

Code for Token Counting with tiktoken

import tiktoken

def count_tokens(text: str, model: str = "gpt-4o-mini") -> int:
    encoding = tiktoken.get_encoding("o200k_base")
    tokens = encoding.encode(text)
    return len(tokens)

system_prompt = "Finish the given task"
user_prompt = "Say hello world"
completion = "Hello, world!"

print(count_tokens(system_prompt))  # Output: 4
print(count_tokens(user_prompt))    # Output: 3
print(count_tokens(completion))     # Output: 4

Code for API Call

from openai import OpenAI

client_openai = OpenAI()
response = client_openai.chat.completions.create(
    model="o4-mini-2025-04-16",  # Also tested with gpt-4o-mini-2024-07-18
    messages=[
        {"role": "system", "content": "Finish the given task"},
        {"role": "user", "content": "Say hello world"},
    ],
    max_completion_tokens=10000,
    temerature=1.0
)

querys = response.choices[0].message.content
usage = response.usage
print(querys)  # Output: Hello, world!
print(usage)

API Response usage Field

CompletionUsage(
    completion_tokens=214,
    prompt_tokens=17,
    total_tokens=231,
    completion_tokens_details=CompletionTokensDetails(
        accepted_prediction_tokens=0,
        audio_tokens=0,
        reasoning_tokens=192,
        rejected_prediction_tokens=0
    ),
    prompt_tokens_details=PromptTokensDetails(
        audio_tokens=0,
        cached_tokens=0
    )
)

I think you convoluted your report of models with two different behaviors.

gpt-4o-mini is not a reasoning model. We will discuss that first.

Each message has an overhead of being placed in a token container with a role name. It can be determined in this manner. We start with what we measure:

The developer message is 9 tokens.
The user message is 10 tokens.

We will delete them one at a time, and we’ll also add a duplicate one at a time, and make our API calls and see the input.

"messages": [

  # del 30->17 tokens = 13 tokens per developer message
-  {"role": "developer", "content": "You are ChatAPI, an AI assistant."},tokens
  # add 30->43 tokens = 13 tokens per developer message
+  #{"role": "developer", "content": "You are ChatAPI, an AI assistant."},

  # del 30->16 tokens == 14 tokens per user message
-  {"role": "user", "content": prompt},
  # add 30->44 tokens == 14 tokens per user message
+  #{"role": "user", "content": prompt},

9–>13 and 10–>14 means a four-token overhead per message
13 + 14 = 27 tokens of 30 billed,
MEANING: 3 overhead per call of hidden <|start|>assistant


So our API call and the reported usage makes sense:

API Call body:

 {
  "model": "gpt-4.1-mini",
  "messages": [
    {
      "role": "developer",
      "content": "You are ChatAPI, an AI assistant."
    },
    {
      "role": "user",
      "content": "Write a haiki poem about AI token usage costs"
    }
  ],
  "max_completion_tokens": 1100,
  "stream": true,
  "stream_options": {
    "include_usage": true
  }
}

Silent counts unfold,
Tokens spent in streams of thought—
Value weighs in light.

Usage:

 {
   "prompt_tokens": 30,
   "completion_tokens": 18,
   "reasoning_tokens": 0
}

Reasoning usage, o4-mini, "reasoning_effort": "low",

Tokens drip like rain
Each request bills the ledger
Mindful cuts save gold

Usage:
{
“prompt_tokens”: 29,
“completion_tokens”: 226,
“reasoning_tokens”: 192
}

Reasoning usage, o4-mini, "reasoning_effort": "high",


“max_completion_tokens”: 1100,
“stream”: true,
“stream_options”: {
“include_usage”: true
}
}

Usage:

 {
   "prompt_tokens": 29,
   "completion_tokens": 1100,
   "reasoning_tokens": 1100
}

What’s going on here?

You are billed for internal token generations of “reasoning” - the model thinking about what the user wants and how to write it, in unseen output generation.

max_completion_tokens is also a maximum budget setting. I set it to 1100 tokens, set the reasoning effort parameter to high, and got nothing back because the AI was still in “thinking mode” when it hit that limit.

So everything falls into place with expectations, when your expectations are informed by understanding.