How to use cached_tokens field to calculate cost estimation

Mesut_Celik · July 31, 2025, 3:52pm

Hi,

I am trying to make sense usage data in each prompt and how to calculate cost.
“cached_tokens” field is complete mystery to me. I am attaching a series of usage json below. I have 3 questions.

How to calculate cost for each prompt using input token + cached token pricing.
How is it possible that cached_tokens = 0 in the middle of chat session. (see below)
How is it possible that input_tokens is less than cached_tokens in one case below. I thought cached_tokens field is subset of input_tokens field.

{“input_tokens” => 4856, “total_tokens” => 4887, “output_tokens” => 31, “input_tokens_details” => {"cached_tokens" => 0}, “output_tokens_details” => {“reasoning_tokens” => 0}},

{},

{“input_tokens” => 4895, “total_tokens” => 4928, “output_tokens” => 33, “input_tokens_details” => {“cached_tokens” => 4776}, “output_tokens_details” => {“reasoning_tokens” => 0}},

{},

{“input_tokens” => 4936, “total_tokens” => 4974, “output_tokens” => 38, “input_tokens_details” => {“cached_tokens” => 4904}, “output_tokens_details” => {“reasoning_tokens” => 0}},

{},

{“input_tokens” => 4989, “total_tokens” => 5119, “output_tokens” => 130, “input_tokens_details” => {"cached_tokens" => 0}, “output_tokens_details” => {“reasoning_tokens” => 0}},

{},

{“input_tokens” => 5136, “total_tokens” => 5337, “output_tokens” => 201, “input_tokens_details” => {"cached_tokens" => 10192}, “output_tokens_details” => {“reasoning_tokens” => 0}},

{},

{“input_tokens” => 5652, “total_tokens” => 5878, “output_tokens” => 226, “input_tokens_details” => {“cached_tokens” => 5288}, “output_tokens_details” => {“reasoning_tokens” => 0}},

{},

{“input_tokens” => 6004, “total_tokens” => 6055, “output_tokens” => 51, “input_tokens_details” => {“cached_tokens” => 5800}, “output_tokens_details” => {“reasoning_tokens” => 0}},

{},

{“input_tokens” => 6122, “total_tokens” => 6197, “output_tokens” => 75, “input_tokens_details” => {"cached_tokens" => 0}, “output_tokens_details” => {“reasoning_tokens” => 0}},

{},

{“input_tokens” => 6491, “total_tokens” => 6537, “output_tokens” => 46, “input_tokens_details” => {“cached_tokens” => 6184}, “output_tokens_details” => {“reasoning_tokens” => 0}},

{},

_j · July 31, 2025, 4:57pm

Let’s just search things I’ve said before…

Some code developed for demonstrating a model call’s final price, by its token costs and the returned usage and its cached_tokens field (not considering audio token costs).

Note that “cached” is not ultimately as a percentage, it is a reduced token cost specified per-model, and before discount percentage varied, that code snippet anticipated this by taking the model pricing fields.

OpenAI makes a “best effort to route to the same server”, but the context window cache can persist for a time as low as five minutes. There isn’t a central cache database, it is per AI model server.

You can read details about how the same server is “found”…

Topic		Replies	Views
Will cached_prompt be charged in each API call? API	5	827	October 14, 2024
How to save input tokens in Responses API? API responses	5	1109	May 23, 2025
Responses API high token consumption API responses , responses-api	9	752	November 8, 2025
How Prompt caching works? API assistants-api , prompt-caching	17	9872	February 4, 2025
Pricing model Open AI Assistants API - Caching tokens API assistants-api	0	216	December 3, 2024

How to use cached_tokens field to calculate cost estimation

Related topics