How to correct compute the cost of an o1 model API call?

I recently ran API calls to o1-preview-2024-09-12 and got the following usage:

usage=CompletionUsage(
completion_tokens=2535,
prompt_tokens=2385,
total_tokens=4920,
prompt_tokens_details={
‘cached_tokens’: 2305,
‘audio_tokens’: 0},
completion_tokens_details={
‘reasoning_tokens’: 2241,
‘audio_tokens’: 0,
‘accepted_prediction_tokens’: 0,
‘rejected_prediction_tokens’: 0})

I see from the website that the current rates for o1-preview-2024-09-12 are:

  • $15.00 / 1M input tokens
  • $7.50 / 1M cached input tokens
  • $60.00 / 1M output tokens
  • “Cached prompts are offered at a 50% discount compared to uncached prompts.”
  • “Output tokens include internal reasoning tokens generated by the model that are not visible in API responses.”

In this case, how can I correctly compute what the total cost of this single API call is? I am a bit confused on the extra notes.

We visit the API Pricing page to get the price for usage.

$15.00 / 1M input tokens
$7.50 / 1M cached input tokens
$60.00 / 1M output tokens

Then we get the price per token:

million = 1000000
prompt_token_price = 15.00 / million
cached_token_price = 7.50 / million
completion_token_price = 60.00 / million

The cache cost is directly stated, not given as a percentage, so we will use that.

The cost of output is directly in the usage response:

completion_token_cost = usage.completion_tokens * completion_token_price

The cost of input is co-mingled, the cached_tokens giving the portion of prompt_tokens that had discounting. We need to separate them.

uncached_tokens = usage.prompt_tokens - usage.cached_tokens

Then:

uncached_token_cost = uncached_tokens * prompt_token_price
cached_token_cost = usage.cached_tokens * cached_token_price

So:

total_cost = uncached_token_cost + cached_token_cost + completion_token_cost

usage.rejected_prediction_tokens on other models is more billing at completion_token_price.

Audio tokens also are a portion of prompt and completion tokens that are billed at a much higher rate, which you can similarly address when using voice models.

(hard-coded):

million = 1_000_000
total_cost = ((usage.prompt_tokens - usage.cached_tokens) * (15.00 / million) + 
              usage.cached_tokens * (7.50 / million) +
              usage.completion_tokens * (60.00 / million))

…supposing you get response.usage out, and further use its Python methods.

1 Like