Cached input audio_tokens is always 0

davekiss · November 8, 2024, 2:34pm

I’m using the Realtime API (based off of the GitHub console example) and running into no input audio tokens being cached, as the title says. Is there anything I need to do on my end to get this to work as expected? I would guess that the caching identification would occur on the server side.

My project ID is proj_gVgeRdz2IgsyNgRukZ5IPOvs if anyone from OpenAI like to look closer. Short conversations are ballooning in cost due to the audio aggregation and cache miss, making production deployment unreasonable.

ivan-luchkin-u · November 8, 2024, 7:29pm

We (on this forum) have already established that output audio tokens are getting consumed as input on each new turn, which means that at least some audio tokens should be cached.

From looking at the latest commits (and just code in general) in the reference client published by OpenAI on GitHub, I can tell that they haven’t added anything related to the prompt caching.

Can you clarify on how exactly have you come to a conclusion that nothing is being cached?

davekiss · November 8, 2024, 9:23pm

Yep, I’m manually inspecting all events in a relay server and this is the raw payload for a response.done event:

{
  "type": "response.done",
  "event_id": "event_ARJvLp4nEMT6iuuBqPmVJ",
  "response": {
    "object": "realtime.response",
    "id": "resp_ARJvFDphUQJE6mgYdaK4G",
    "status": "completed",
    "status_details": null,
    "output": [
      {
        "id": "item_ARJvFNjqac1js9tdtVNiU",
        "object": "realtime.item",
        "type": "message",
        "status": "completed",
        "role": "assistant",
        "content": [
          {
            "type": "audio",
            "transcript": "..."
          }
        ]
      }
    ],
    "usage": {
      "total_tokens": 2622,
      "input_tokens": 1911,
      "output_tokens": 711,
      "input_token_details": {
        "text_tokens": 698,
        "audio_tokens": 1213,
        "cached_tokens": 384,
        "cached_tokens_details": {
          "text_tokens": 384,
          "audio_tokens": 0
        }
      },
      "output_token_details": {
        "text_tokens": 116,
        "audio_tokens": 595
      }
    }
  }
}

_j · November 8, 2024, 9:47pm

I would take up the offer “if you have any session IDs to debug”:

Remember: cache times out quickly. You might be there listening to the AI blather for five minutes, and the cache expires at that low end of expected time.

Topic		Replies	Views
Help me understand the realtime usage block API realtime	4	767	December 18, 2024
Realtime API pricing is wrong, will overcharge API realtime	36	4069	January 15, 2025
New Realtime API voices and cache pricing Announcements realtime , prompt-caching	27	10591	September 2, 2025
Realtime API re-consuming it's own output audio as input audio API audio , realtime , api-realtime , api-realtime-speech	10	1094	January 10, 2025
Realtime API cost anomaly: disproportionate charges on audio input API api-costs , api-realtime	6	485	July 9, 2025

Cached input audio_tokens is always 0

Related topics