Will cached_prompt be charged in each API call?

When I call a chat.completions.parse API, it returns
completion.usage.total_tokens
completion.usage.prompt_tokens_details.cached_tokens

I understand that the cached_tokens is because of prompt caching, but will OpenAI charge for these cached_tokens? How can I calculate correct cost based on returned usage info?

for example the first call return
total_tokens = 4096
cached_tokens = 0

the second call returns
total_tokens = 8192
cached_tokens = 4096

How can I calculate the cost?

1 Like

Hi! I understand how the API return for usage can be a bit confusing, with the new information fields accompanying the preexisting prompt and completion token values.

Made more presentable, the API usage information is reported as:

Usage: prompt_tokens=245, completion_tokens=32, total_tokens=277
prompt_tokens_details: cached_tokens: 0
completion_tokens_details: reasoning_tokens: 0

(Reasoning tokens are for o1 model billing)

If the context caching system is able to use some of its precomputed state on later similar calls, then that cached token count will appear in the API’s cached_token field.

That is the number of prompt tokens that will be discounted 50%.

1 Like

So, the cost should be calculated as:

 completion_tokens * output token price + (prompt_tokens - cached_tokens) * input token price + cached_tokens * half of input token price

is this correct?

1 Like

You gave:

completion_tokens * output token price + (prompt_tokens - cached_tokens) * input token price + cached_tokens * half of input token price

What do you think of o1-mini’s formula?


# Usage variables
prompt_tokens = 245
completion_tokens = 32
cached_tokens = 0  # Can range from 0 up to prompt_tokens

# Cost variables (per million tokens)
prompt_cost_per_1m = 2.50      # $2.50 per 1M input tokens
output_cost_per_1m = 10.00     # $10.00 per 1M output tokens

# Calculate the cost
cost = (
    prompt_cost_per_1m * (prompt_tokens - 0.5 * cached_tokens) +
    output_cost_per_1m * completion_tokens
) / 1_000_000

# Optional: Format the cost to two decimal places
cost = round(cost, 6)  # Adjust precision as needed

print(f"API Call Cost: ${cost}")

From the input tokens, half of the cached tokens are taken away…

3 Likes

Thanks, this is much better. But I prefer my original formula, as it retain the business logics and easy to understand :slight_smile:

1 Like

Sound great.

Since OpenAI also went through the work of explicitly stating the cached token price on models, we could give the benefit of doubt that the percentage might decrease, and use that dollar cost as formula input.

1 Like