I’m encountering an issue with token counting in the o4-mini
and gpt-4o-mini
models, where the token counts reported in the API’s usage
field significantly exceed those calculated using the tiktoken
library. Here’s a detailed description:
Issue Description
For a simple test case:
- System Prompt: “Finish the given task”
- User Prompt: “Say hello world”
- Output: “Hello, world!”
Using the tiktoken
library with the o200k_base
encoding, I calculated:
- System prompt: 4 tokens
- User prompt: 3 tokens
- Output: 4 tokens
However, the API’s usage
field reports:
prompt_tokens
: 17completion_tokens
: 214reasoning_tokens
: 192total_tokens
: 231
Does it mean that:
“Hello, world!” cost 22 tokens (214 completion_tokens - 192 reasoning_tokens)? But tiktoken shows that it only includes 4 tokens.
This suggests a significant discrepancy, with prompt_tokens
and completion_tokens
higher than expected (about 20 tokens).
Code for Token Counting with tiktoken
import tiktoken
def count_tokens(text: str, model: str = "gpt-4o-mini") -> int:
encoding = tiktoken.get_encoding("o200k_base")
tokens = encoding.encode(text)
return len(tokens)
system_prompt = "Finish the given task"
user_prompt = "Say hello world"
completion = "Hello, world!"
print(count_tokens(system_prompt)) # Output: 4
print(count_tokens(user_prompt)) # Output: 3
print(count_tokens(completion)) # Output: 4
Code for API Call
from openai import OpenAI
client_openai = OpenAI()
response = client_openai.chat.completions.create(
model="o4-mini-2025-04-16", # Also tested with gpt-4o-mini-2024-07-18
messages=[
{"role": "system", "content": "Finish the given task"},
{"role": "user", "content": "Say hello world"},
],
max_completion_tokens=10000,
temerature=1.0
)
querys = response.choices[0].message.content
usage = response.usage
print(querys) # Output: Hello, world!
print(usage)
API Response usage Field
CompletionUsage(
completion_tokens=214,
prompt_tokens=17,
total_tokens=231,
completion_tokens_details=CompletionTokensDetails(
accepted_prediction_tokens=0,
audio_tokens=0,
reasoning_tokens=192,
rejected_prediction_tokens=0
),
prompt_tokens_details=PromptTokensDetails(
audio_tokens=0,
cached_tokens=0
)
)