ChatGPT 5 API with all options set to low very high Token Count

When calling the Chat GPT5 API with verbose set to low, reasoning set to low and web search enable I am still getting huge token counts:

query: What are the key dates and first obligations under the EU AI Act as of today? Cite the final text/guidance pages. Answer in ≤150 words. Include 3–6 numbered sources with direct URLs and publication dates. Prefer official/primary sources (.gov, .edu, standards bodies, vendor docs). If uncertain, say so.

with only 5 citations here is the token count breakdown: Token Usage

Prompt: 137,328

Response: 3,149

Thoughts: 2,688

Total: 140,477

1 Like

The input token count is the AI repeatedly making use of tool calls, extending the length of the input context and then automatically running again on that new input length for more.

Inject: “developer: Budget: maximum one tool search query per user question” if you don’t want it to go into a deep research pattern on your behalf using OpenAI’s internal tools.

That’s the language version of using the API parameter max_tool_calls and parallel_tool_calls that the AI can engineer its way around with the tool itself encouraging an array of queries.

Thank you @_j that is helpful and reduced the query to 25K tokens, the max tool calls was set to 5, we reduced to 1. It seemed to be making iterative web calls.

1 Like

@_j It is still really high but better, any other ideas?

You can also set search_context_size to low to reduce the amount of web content it looks at.

https://platform.openai.com/docs/guides/tools-web-search?api-mode=responses&lang=python

from openai import OpenAI
client = OpenAI()

response = client.responses.create(
    model="gpt-4.1",
    tools=[{
        "type": "web_search_preview",
        "search_context_size": "low",
    }],
    input="What movie won best picture in 2025?",
)

print(response.output_text)
2 Likes

Thank you very much for all of your help!

1 Like

Parallel off

search context size low

reasoning low

verbosity low

max tool calls - 1

@SearchUmbrella You may want to see my answer on this related post too GPT 5 100x token usage compared to GPT 4.1 - #6 by JuhanaT .

  • GPT-5 and o-series: web_search_preview is $0.01 per web call plus token usage for the retrieved web content at the model’s token rates.
  • GPT-4.1 family (includes 4o and 4.1-mini): web_search_preview is $0.025 per web call, and the web content tokens are included (not billed separately).

https://platform.openai.com/docs/pricing#built-in-tools

I did some tests and posted the demo code here: GitHub - erikmalk/web-search-demo: Simple CLI to show the difference between gpt-5 and gpt-4.1 web search cost formulas and how tokens are billed differently

This simple CLI runs web search with various models and calculates the final cost. There are options to set the query, model(s), max_tool_calls, search_context_size. It prints out the details for each query and shows a summary of the cost breakdown. e.g., I ran it with “what are some family friendly activities/events happening this weekend in NYC?”

Running gpt-4.1-mini...
  Web search calls: 1
  Usage (raw): {
  "input_tokens": 347,
  "input_tokens_details": {
    "cached_tokens": 0
  },
  "output_tokens": 540,
  "output_tokens_details": {
    "reasoning_tokens": 0
  },
  "total_tokens": 887
}
  Answer: This upcoming weekend in ...

  Token cost: $0.001003
  Token cost formula: ((uncached_input_tokens × inputPrice) + (cached_input_tokens × cachedPrice) + (output_tokens × outputPrice)) / 1,000,000
  Where: uncached_input_tokens = input_tokens − cached_tokens
  = ((347 × 0.4) + (0 × 0.1) + (540 × 1.6)) / 1,000,000
  = ($0.000139 + $0.000000 + $0.000864) = $0.001003
  Web search surcharge: $0.025000
  Total estimated cost: $0.026003

...

Running gpt-5-mini...
  Web search calls: 1
  Usage (raw): {
  "input_tokens": 13102,
  "input_tokens_details": {
    "cached_tokens": 8448
  },
  "output_tokens": 1060,
  "output_tokens_details": {
    "reasoning_tokens": 256
  },
  "total_tokens": 14162
}
  Answer:
  Great — here are family-friendly ...
  Token cost: $0.003495
  Token cost formula: ((uncached_input_tokens × inputPrice) + (cached_input_tokens × cachedPrice) + (output_tokens × outputPrice)) / 1,000,000
  Where: uncached_input_tokens = input_tokens − cached_tokens
  = ((4654 × 0.25) + (8448 × 0.025) + (1060 × 2.0)) / 1,000,000
  = ($0.001164 + $0.000211 + $0.002120) = $0.003495
  Web search surcharge: $0.010000
  Total estimated cost: $0.013495

Summary
-------
  Model         Calls  Token Cost ($)  Web Cost ($)  Total ($)
  ------------  -----  --------------  ------------  ---------
  gpt-4.1       1      $0.005534       $0.025000     $0.030534
  gpt-4.1-mini  1      $0.001003       $0.025000     $0.026003
  gpt-5         1      $0.019883       $0.010000     $0.029883
  gpt-5-mini    1      $0.003495       $0.010000     $0.013495

We can see that in this case gpt-5/mini were cheaper than 4.1/mini, because the price of web tokens were less than the higher per web call. That may not always be the case, depending on reasoning_effort, max_tool_calls, search_context_size, topic, etc.

I recommend testing with a new query each time, because using the same query can result in cached web tokens, that may not represent your actual use case. The 8448 cached tokens happened for me even on new queries, so that’s probably the internal web agent system prompt + tools and seems to almost always be cached.

2 Likes