We visit the API Pricing page to get the price for usage.
$15.00 / 1M input tokens
$7.50 / 1M cached input tokens
$60.00 / 1M output tokens
Then we get the price per token:
million = 1000000
prompt_token_price = 15.00 / million
cached_token_price = 7.50 / million
completion_token_price = 60.00 / million
The cache cost is directly stated, not given as a percentage, so we will use that.
The cost of output is directly in the usage response:
completion_token_cost = usage.completion_tokens * completion_token_price
The cost of input is co-mingled, the cached_tokens giving the portion of prompt_tokens that had discounting. We need to separate them.
uncached_tokens = usage.prompt_tokens - usage.cached_tokens
Then:
uncached_token_cost = uncached_tokens * prompt_token_price
cached_token_cost = usage.cached_tokens * cached_token_price
So:
total_cost = uncached_token_cost + cached_token_cost + completion_token_cost
usage.rejected_prediction_tokens on other models is more billing at completion_token_price.
Audio tokens also are a portion of prompt and completion tokens that are billed at a much higher rate, which you can similarly address when using voice models.
(hard-coded):
million = 1_000_000
total_cost = ((usage.prompt_tokens - usage.cached_tokens) * (15.00 / million) +
usage.cached_tokens * (7.50 / million) +
usage.completion_tokens * (60.00 / million))
…supposing you get response.usage out, and further use its Python methods.