R2D2BOT
January 22, 2025, 11:38pm
1
I recently ran API calls to o1-preview-2024-09-12
and got the following usage:
usage=CompletionUsage(
completion_tokens=2535,
prompt_tokens=2385,
total_tokens=4920,
prompt_tokens_details ={
‘cached_tokens’: 2305,
‘audio_tokens’: 0},
completion_tokens_details ={
‘reasoning_tokens’: 2241,
‘audio_tokens’: 0,
‘accepted_prediction_tokens’: 0,
‘rejected_prediction_tokens’: 0})
I see from the website that the current rates for o1-preview-2024-09-12
are:
$15.00 / 1M input tokens
$7.50 / 1M cached input tokens
$60.00 / 1M output tokens
“Cached prompts are offered at a 50% discount compared to uncached prompts.”
“Output tokens include internal reasoning tokens generated by the model that are not visible in API responses.”
In this case, how can I correctly compute what the total cost of this single API call is? I am a bit confused on the extra notes.
_j
January 23, 2025, 12:24am
2
We visit the API Pricing page to get the price for usage.
$15.00 / 1M input tokens
$7.50 / 1M cached input tokens
$60.00 / 1M output tokens
Then we get the price per token:
million = 1000000
prompt_token_price = 15.00 / million
cached_token_price = 7.50 / million
completion_token_price = 60.00 / million
The cache cost is directly stated , not given as a percentage, so we will use that.
The cost of output is directly in the usage response:
completion_token_cost = usage.completion_tokens * completion_token_price
The cost of input is co-mingled, the cached_tokens
giving the portion of prompt_tokens
that had discounting. We need to separate them.
uncached_tokens = usage.prompt_tokens - usage.cached_tokens
Then:
uncached_token_cost = uncached_tokens * prompt_token_price
cached_token_cost = usage.cached_tokens * cached_token_price
So:
total_cost = uncached_token_cost + cached_token_cost + completion_token_cost
usage.rejected_prediction_tokens on other models is more billing at completion_token_price
.
Audio tokens also are a portion of prompt and completion tokens that are billed at a much higher rate, which you can similarly address when using voice models.
(hard-coded):
million = 1_000_000
total_cost = ((usage.prompt_tokens - usage.cached_tokens) * (15.00 / million) +
usage.cached_tokens * (7.50 / million) +
usage.completion_tokens * (60.00 / million))
…supposing you get response.usage out, and further use its Python methods.
1 Like