How to correct compute the cost of an o1 model API call?

R2D2BOT · January 22, 2025, 11:38pm

I recently ran API calls to o1-preview-2024-09-12 and got the following usage:

usage=CompletionUsage(
completion_tokens=2535,
prompt_tokens=2385,
total_tokens=4920,
prompt_tokens_details={
‘cached_tokens’: 2305,
‘audio_tokens’: 0},
completion_tokens_details={
‘reasoning_tokens’: 2241,
‘audio_tokens’: 0,
‘accepted_prediction_tokens’: 0,
‘rejected_prediction_tokens’: 0})

I see from the website that the current rates for o1-preview-2024-09-12 are:

$15.00 / 1M input tokens
$7.50 / 1M cached input tokens
$60.00 / 1M output tokens

“Cached prompts are offered at a 50% discount compared to uncached prompts.”
“Output tokens include internal reasoning tokens generated by the model that are not visible in API responses.”

In this case, how can I correctly compute what the total cost of this single API call is? I am a bit confused on the extra notes.

_j · January 23, 2025, 12:24am

We visit the API Pricing page to get the price for usage.

$15.00 / 1M input tokens
$7.50 / 1M cached input tokens
$60.00 / 1M output tokens

Then we get the price per token:

million = 1000000
prompt_token_price = 15.00 / million
cached_token_price = 7.50 / million
completion_token_price = 60.00 / million

The cache cost is directly stated, not given as a percentage, so we will use that.

The cost of output is directly in the usage response:

completion_token_cost = usage.completion_tokens * completion_token_price

The cost of input is co-mingled, the cached_tokens giving the portion of prompt_tokens that had discounting. We need to separate them.

uncached_tokens = usage.prompt_tokens - usage.cached_tokens

Then:

uncached_token_cost = uncached_tokens * prompt_token_price
cached_token_cost = usage.cached_tokens * cached_token_price

So:

total_cost = uncached_token_cost + cached_token_cost + completion_token_cost

usage.rejected_prediction_tokens on other models is more billing at completion_token_price.

Audio tokens also are a portion of prompt and completion tokens that are billed at a much higher rate, which you can similarly address when using voice models.

(hard-coded):

million = 1_000_000
total_cost = ((usage.prompt_tokens - usage.cached_tokens) * (15.00 / million) + 
              usage.cached_tokens * (7.50 / million) +
              usage.completion_tokens * (60.00 / million))

…supposing you get response.usage out, and further use its Python methods.

Topic		Replies	Views
Will cached_prompt be charged in each API call? API	5	616	October 14, 2024
How to calculate cost of Assistant call API api-billing	1	164	May 7, 2025
How to get the cost for each api call? API openapi , api-costs , o1	2	515	April 9, 2025
Am I begin overcharged for o1-mini? API o1-mini	5	510	September 30, 2024
Understanding billing of usage API gpt-4 , api	7	2109	February 16, 2024

How to correct compute the cost of an o1 model API call?

Related topics