How to use cached_tokens field to calculate cost estimation

Hi,

I am trying to make sense usage data in each prompt and how to calculate cost.
“cached_tokens” field is complete mystery to me. I am attaching a series of usage json below. I have 3 questions.

  1. How to calculate cost for each prompt using input token + cached token pricing.
  2. How is it possible that cached_tokens = 0 in the middle of chat session. (see below)
  3. How is it possible that input_tokens is less than cached_tokens in one case below. I thought cached_tokens field is subset of input_tokens field.

{input_tokens => 4856, total_tokens => 4887, output_tokens => 31, input_tokens_details => {"cached_tokens" => 0}, output_tokens_details => {reasoning_tokens => 0}},

{},

{input_tokens => 4895, total_tokens => 4928, output_tokens => 33, input_tokens_details => {cached_tokens => 4776}, output_tokens_details => {reasoning_tokens => 0}},

{},

{input_tokens => 4936, total_tokens => 4974, output_tokens => 38, input_tokens_details => {cached_tokens => 4904}, output_tokens_details => {reasoning_tokens => 0}},

{},

{input_tokens => 4989, total_tokens => 5119, output_tokens => 130, input_tokens_details => {"cached_tokens" => 0}, output_tokens_details => {reasoning_tokens => 0}},

{},

{input_tokens => 5136, total_tokens => 5337, output_tokens => 201, input_tokens_details => {"cached_tokens" => 10192}, output_tokens_details => {reasoning_tokens => 0}},

{},

{input_tokens => 5652, total_tokens => 5878, output_tokens => 226, input_tokens_details => {cached_tokens => 5288}, output_tokens_details => {reasoning_tokens => 0}},

{},

{input_tokens => 6004, total_tokens => 6055, output_tokens => 51, input_tokens_details => {cached_tokens => 5800}, output_tokens_details => {reasoning_tokens => 0}},

{},

{input_tokens => 6122, total_tokens => 6197, output_tokens => 75, input_tokens_details => {"cached_tokens" => 0}, output_tokens_details => {reasoning_tokens => 0}},

{},

{input_tokens => 6491, total_tokens => 6537, output_tokens => 46, input_tokens_details => {cached_tokens => 6184}, output_tokens_details => {reasoning_tokens => 0}},

{},

Let’s just search things I’ve said before…

Some code developed for demonstrating a model call’s final price, by its token costs and the returned usage and its cached_tokens field (not considering audio token costs).

Note that “cached” is not ultimately as a percentage, it is a reduced token cost specified per-model, and before discount percentage varied, that code snippet anticipated this by taking the model pricing fields.

OpenAI makes a “best effort to route to the same server”, but the context window cache can persist for a time as low as five minutes. There isn’t a central cache database, it is per AI model server.

You can read details about how the same server is “found”…