Help me understand the realtime usage block

swillison · December 18, 2024, 5:25am

I’m getting this usage block from the WebRTC realtime API and I want to calculate total cost based on it:

{
  "total_tokens": 821,
  "input_tokens": 789,
  "output_tokens": 32,
  "input_token_details": {
    "text_tokens": 313,
    "audio_tokens": 476,
    "cached_tokens": 640,
    "cached_tokens_details": {
      "text_tokens": 256,
      "audio_tokens": 384
    }
  },
  "output_token_details": {
    "text_tokens": 9,
    "audio_tokens": 23
  }
}

I’m confused by the cached tokens. If I have 313 input text_tokens and 256 cached text_tokens does that mean I need to calculate the cost of 313-256 = 57 text tokens ($2.50/million) and then add on the cost of 256 cached tokens?

The price of cached tokens for the audio preview API isn’t listed on https://openai.com/api/pricing/

The blog entry https://openai.com/index/o1-and-new-tools-for-developers/ says “Cached audio input costs are reduced by 87.5% to $2.50/1M input tokens” but doesn’t say anything about text tokens. BUT for the new GPT-4o mini audio preview API it says “Cached audio and text both cost $0.30/1M tokens” - does that mean that for GPT-4o audio preview cached text tokens cost the same as cached audio tokens?

Foxalabs · December 18, 2024, 5:27am

To the best of my knowledge, cached tokens are charged at 50% of normal, so yes, remove those from the total and add on half that number to get an accurate cost.

j0rdan · December 18, 2024, 10:48am

The cached tokens pricing is listed under the Realtime API section (just under Fine-tuning models) on that page, gotta scroll further down a little more (I don’t know why it’s that low in the page lol).

One thing to note is that the gpt-4o-audio-preview and gpt-4o-mini-audio-preview models are available in the Chat Completions API and differ from the Realtime API models.

As for your usage calculation, this is what the pricing page says for gpt-4o-realtime-preview-2024-12-17 (which is the new realtime snapshot released just yesterday):

Text
$5.00 / 1M input tokens
$2.50 / 1M cached* input tokens
$20.00 / 1M output tokens
Audio
$40.00 / 1M input tokens
$2.50 / 1M cached* input tokens
$80.00 / 1M output tokens

Based on my understanding of the pricing, for your example it works like this:

# Input text tokens
total: 313 tokens
--> cached: 256 tokens (billed $2.5 / 1M)
--> normal: 313-256 = 57 tokens (billed $5 / 1M)

# Input audio tokens
total: 476 tokens
--> cached: 384 tokens (billed $2.5 / 1M)
--> normal: 476-384 = 92 tokens (billed $40 / 1M)

It looks like the text tokens haven’t changed in terms of pricing, but the audio has indeed received that 87.5% reduction for cached tokens.
The previous realtime model snapshot gpt-4o-realtime-preview-2024-10-01 costs $20 / 1M for cached audio tokens, but the new gpt-4o-realtime-preview-2024-12-17 costs $2.5 / 1M. This also means the pricing for cached audio and text tokens are the same in this snapshot (both $2.5 / 1M according to the pricing page).

swillison · December 18, 2024, 3:38pm

Here’s what I ended up implementing. I’m not 100% confident I’ve got the calculations right though:

https://tools.simonwillison.net/openai-webrtc

Source code here: tools/openai-webrtc.html at c9f3085107fd1177329846de95c840eda64b1748 · simonw/tools · GitHub

sps · December 18, 2024, 4:40pm

Text input that hits the cache costs 50% less. Audio input that hits the cache costs 80% less.

Here is the announcement regarding prompt caching on the Realtime API:

Topic		Replies	Views
Cached input audio_tokens is always 0 API audio , realtime	3	421	November 8, 2024
New Realtime API voices and cache pricing Announcements realtime , prompt-caching	27	10588	September 2, 2025
I don't understand the pricing for the realtime API API realtime	35	18596	August 12, 2025
Realtime API pricing is wrong, will overcharge API realtime	36	4068	January 15, 2025
Confusion Between Per-Minute Audio Pricing vs. Token-Based Audio Pricing API realtime	3	6568	December 30, 2024

Help me understand the realtime usage block

Related topics