How to correct compute the cost of an o1 model API call?

We visit the API Pricing page to get the price for usage.

$15.00 / 1M input tokens
$7.50 / 1M cached input tokens
$60.00 / 1M output tokens

Then we get the price per token:

million = 1000000
prompt_token_price = 15.00 / million
cached_token_price = 7.50 / million
completion_token_price = 60.00 / million

The cache cost is directly stated, not given as a percentage, so we will use that.

The cost of output is directly in the usage response:

completion_token_cost = usage.completion_tokens * completion_token_price

The cost of input is co-mingled, the cached_tokens giving the portion of prompt_tokens that had discounting. We need to separate them.

uncached_tokens = usage.prompt_tokens - usage.cached_tokens

Then:

uncached_token_cost = uncached_tokens * prompt_token_price
cached_token_cost = usage.cached_tokens * cached_token_price

So:

total_cost = uncached_token_cost + cached_token_cost + completion_token_cost

usage.rejected_prediction_tokens on other models is more billing at completion_token_price.

Audio tokens also are a portion of prompt and completion tokens that are billed at a much higher rate, which you can similarly address when using voice models.

(hard-coded):

million = 1_000_000
total_cost = ((usage.prompt_tokens - usage.cached_tokens) * (15.00 / million) + 
              usage.cached_tokens * (7.50 / million) +
              usage.completion_tokens * (60.00 / million))

…supposing you get response.usage out, and further use its Python methods.

1 Like