Discrepancy in Token Counts Between tiktoken and API Usage for o4-mini/gpt-4o-mini

Qiang_Hu · May 28, 2025, 9:09am

I’m encountering an issue with token counting in the o4-mini and gpt-4o-mini models, where the token counts reported in the API’s usage field significantly exceed those calculated using the tiktoken library. Here’s a detailed description:

Issue Description

For a simple test case:

System Prompt: “Finish the given task”
User Prompt: “Say hello world”
Output: “Hello, world!”

Using the tiktoken library with the o200k_base encoding, I calculated:

System prompt: 4 tokens
User prompt: 3 tokens
Output: 4 tokens

However, the API’s usage field reports:

prompt_tokens: 17
completion_tokens: 214
reasoning_tokens: 192
total_tokens: 231

Does it mean that:
“Hello, world!” cost 22 tokens (214 completion_tokens - 192 reasoning_tokens)? But tiktoken shows that it only includes 4 tokens.

This suggests a significant discrepancy, with prompt_tokens and completion_tokens higher than expected (about 20 tokens).

Code for Token Counting with `tiktoken`

import tiktoken

def count_tokens(text: str, model: str = "gpt-4o-mini") -> int:
    encoding = tiktoken.get_encoding("o200k_base")
    tokens = encoding.encode(text)
    return len(tokens)

system_prompt = "Finish the given task"
user_prompt = "Say hello world"
completion = "Hello, world!"

print(count_tokens(system_prompt))  # Output: 4
print(count_tokens(user_prompt))    # Output: 3
print(count_tokens(completion))     # Output: 4

Code for API Call

from openai import OpenAI

client_openai = OpenAI()
response = client_openai.chat.completions.create(
    model="o4-mini-2025-04-16",  # Also tested with gpt-4o-mini-2024-07-18
    messages=[
        {"role": "system", "content": "Finish the given task"},
        {"role": "user", "content": "Say hello world"},
    ],
    max_completion_tokens=10000,
    temerature=1.0
)

querys = response.choices[0].message.content
usage = response.usage
print(querys)  # Output: Hello, world!
print(usage)

API Response usage Field

CompletionUsage(
    completion_tokens=214,
    prompt_tokens=17,
    total_tokens=231,
    completion_tokens_details=CompletionTokensDetails(
        accepted_prediction_tokens=0,
        audio_tokens=0,
        reasoning_tokens=192,
        rejected_prediction_tokens=0
    ),
    prompt_tokens_details=PromptTokensDetails(
        audio_tokens=0,
        cached_tokens=0
    )
)

_j · May 28, 2025, 1:03pm

I think you convoluted your report of models with two different behaviors.

gpt-4o-mini is not a reasoning model. We will discuss that first.

Each message has an overhead of being placed in a token container with a role name. It can be determined in this manner. We start with what we measure:

The developer message is 9 tokens.
The user message is 10 tokens.

We will delete them one at a time, and we’ll also add a duplicate one at a time, and make our API calls and see the input.

"messages": [

  # del 30->17 tokens = 13 tokens per developer message
-  {"role": "developer", "content": "You are ChatAPI, an AI assistant."},tokens
  # add 30->43 tokens = 13 tokens per developer message
+  #{"role": "developer", "content": "You are ChatAPI, an AI assistant."},

  # del 30->16 tokens == 14 tokens per user message
-  {"role": "user", "content": prompt},
  # add 30->44 tokens == 14 tokens per user message
+  #{"role": "user", "content": prompt},

9–>13 and 10–>14 means a four-token overhead per message
13 + 14 = 27 tokens of 30 billed,
MEANING: 3 overhead per call of hidden <|start|>assistant

So our API call and the reported usage makes sense:

API Call body:

 {
  "model": "gpt-4.1-mini",
  "messages": [
    {
      "role": "developer",
      "content": "You are ChatAPI, an AI assistant."
    },
    {
      "role": "user",
      "content": "Write a haiki poem about AI token usage costs"
    }
  ],
  "max_completion_tokens": 1100,
  "stream": true,
  "stream_options": {
    "include_usage": true
  }
}

Silent counts unfold,
Tokens spent in streams of thought—
Value weighs in light.

Usage:

 {
   "prompt_tokens": 30,
   "completion_tokens": 18,
   "reasoning_tokens": 0
}

Reasoning usage, o4-mini, `"reasoning_effort": "low",`

Tokens drip like rain
Each request bills the ledger
Mindful cuts save gold

Usage:
{
“prompt_tokens”: 29,
“completion_tokens”: 226,
“reasoning_tokens”: 192
}

Reasoning usage, o4-mini, `"reasoning_effort": "high",`

…
“max_completion_tokens”: 1100,
“stream”: true,
“stream_options”: {
“include_usage”: true
}
}

Usage:

 {
   "prompt_tokens": 29,
   "completion_tokens": 1100,
   "reasoning_tokens": 1100
}

What’s going on here?

You are billed for internal token generations of “reasoning” - the model thinking about what the user wants and how to write it, in unseen output generation.

max_completion_tokens is also a maximum budget setting. I set it to 1100 tokens, set the reasoning effort parameter to high, and got nothing back because the AI was still in “thinking mode” when it hit that limit.

So everything falls into place with expectations, when your expectations are informed by understanding.

Topic		Replies	Views
Am I begin overcharged for o1-mini? API o1-mini	5	471	September 30, 2024
Using the API the token count is off API	10	1574	January 16, 2024
Prompt tokens usage seems too high API api	1	2565	January 21, 2024
Unexpected High Token Usage on OpenAI API Community gpt-4 , chatgpt , api	1	256	January 26, 2025
Token Discrepancy When Using Images API	4	235	January 9, 2025

Discrepancy in Token Counts Between tiktoken and API Usage for o4-mini/gpt-4o-mini

Issue Description

Code for Token Counting with tiktoken

Code for API Call

API Response usage Field

Reasoning usage, o4-mini, "reasoning_effort": "low",

Reasoning usage, o4-mini, "reasoning_effort": "high",

Related topics

Code for Token Counting with `tiktoken`

Reasoning usage, o4-mini, `"reasoning_effort": "low",`

Reasoning usage, o4-mini, `"reasoning_effort": "high",`