4o input not being cached

dsb · December 20, 2024, 8:17pm

Hello. For the past couple days, GPT-4o input tokens are almost never being cached, despite direct sequential calls with large inputs. When I make the exact same calls with o1-mini, the caching works fine. It’d be lovely to have this fixed and save some dollars. Thank you.

sgbtradingsocial · December 21, 2024, 7:54pm

what should we do ?
this bug, almost break my business?
we need any body help to solve this problem ?

b.silva · December 23, 2024, 3:41pm

I have the same problem here, since december 17th caching stopped to work. I don’t know what I should do…

_j · December 23, 2024, 4:08pm

How about 1 million free tokens a day on gpt-4o and o1 series thru February if you can share API calls as training data (unknown if all are eligible, as one must discover this in the platform site).

This may come about with smaller inputs. It seems the amount cached is less than the 128 token steps. 1080 tokens = no cache instead of 1024.

Initial call to gpt-4o-2024-08-06 to populate cache..................................................................................................................................
gpt-4o-2024-08-06 - input tokens: 2491
 done.

Initial call to gpt-4o-2024-05-13 to populate cache..................................................................................................................................
gpt-4o-2024-05-13 - input tokens: 2491
 done.

Initial call to gpt-4o-2024-11-20 to populate cache..................................................................................................................................
gpt-4o-2024-11-20 - input tokens: 2491
 done.

Initial call to gpt-4o-mini to populate cache..................................................................................................................................
gpt-4o-mini - input tokens: 2491
 done.

Cache statistics for gpt-4o-2024-08-06:

Total Trials: 3
Cache Hits (cached_tokens > 0): 3
Cache Misses (cached_tokens == 0): 0

Cache Hit Rate: 100.00%
Cache Miss Rate: 0.00%

Cached Tokens Counts:

Cached Tokens Value	Count
2304	3

Cache statistics for gpt-4o-2024-05-13:

Total Trials: 3
Cache Hits (cached_tokens > 0): 3
Cache Misses (cached_tokens == 0): 0

Cache Hit Rate: 100.00%
Cache Miss Rate: 0.00%

Cached Tokens Counts:

Cached Tokens Value	Count
2304	3

Cache statistics for gpt-4o-2024-11-20:

Total Trials: 3
Cache Hits (cached_tokens > 0): 3
Cache Misses (cached_tokens == 0): 0

Cache Hit Rate: 100.00%
Cache Miss Rate: 0.00%

Cached Tokens Counts:

Cached Tokens Value	Count
2304	3

Cache statistics for gpt-4o-mini:

Total Trials: 3
Cache Hits (cached_tokens > 0): 3
Cache Misses (cached_tokens == 0): 0

Cache Hit Rate: 100.00%
Cache Miss Rate: 0.00%

Cached Tokens Counts:

Cached Tokens Value	Count
2304	3

jim · December 23, 2024, 6:06pm

Oh! I thought this was just me…having the same exact issue pop up. Input tokens not caching anymore (though my date is the 10th), except maybe a penny a day.

osmanbulut · December 23, 2024, 6:28pm

same problem on me too. look at the last 3 days. there is no light blue anymore!

b.silva · December 24, 2024, 4:54am

Are you using assistants api?

dsb · December 24, 2024, 5:34am

I’m not using the Assistants API. Just chat completions with around 40k to 100k context, 95%+ identical from the start. Inputs were, thus, mostly cached until a few days ago and since has been nada. I hope they resolve this as it’s gotten noticeably more expensive.

jim · December 24, 2024, 5:53am

I’m using a combination of both.

Caching effectively disappeared for me this month. Though I feel like this happened before, and then one day when I logged back in they had backfilled all the caching…

AlexanderSchick · December 26, 2024, 3:00pm

I am experiencing the same issue with the gpt-4o-mini-2024-07-18 model, which started occurring around mid-December.

Here is an example of the metadata output of an api call:

{
  "prompt_tokens": 11505,
  "completion_tokens": 92,
  "total_tokens": 11597,
  "prompt_tokens_details": {
    "cached_tokens": 11008,
    "audio_tokens": 0
  },
  "completion_tokens_details": {
    "reasoning_tokens": 0,
    "audio_tokens": 0,
    "accepted_prediction_tokens": 0,
    "rejected_prediction_tokens": 0
  }
}

The output indicates that over 90% of the input tokens are cached, yet the dashboard reports a uncached-to-cached usage ratio of 30:1. This discrepancy is unclear to me.

The input remains unchanged over consecutive calls that appear within seconds after each other. Also, the first call shows 0 cached tokens and only the api calls after that show caching. Everything as expected - except the dashboard (and also cost tracking) showing no caching.

I would appreciate any assistance in resolving this issue!

german.sotero · December 26, 2024, 3:17pm

I have the same problem, since December 18 no cache tokens have been obtained.

b.silva · December 26, 2024, 4:25pm

I’m trying contact with the support, but they keep asking me if we made some change in our app.

I created a test app with the same prompt to check usage and found out the cache is being generated only in the same thread, if thread changes it won’t work.

When the prompt is on the same thread it generates cached tokens but won’t reflect to our “usage dashboard”

f-makino · December 27, 2024, 9:02am

I am also experiencing the same issue , which has been occurring since December 20th. I have already contacted support. I recommend that others facing the same issue should contact support with the URL of this thread.

b.silva · December 27, 2024, 2:44pm

I’ve already contacted support, I’m still waiting for a response.

kmsbernard · December 27, 2024, 3:29pm

I’m experiencing the same issue and have also contacted support.

They mentioned that they’re investigating a bug causing the usage dashboard to display data incorrectly or out of sync with the actual API usage. They’re currently working on a fix.

lilian_k · December 29, 2024, 12:15pm

Same problem here, we have a prod server with very regular queries sharing 90% of their system prompt. (5k+ tokens).

Nothing changed the 19th of December but caching stopped appearing at least in the Usage Dashboard. I didn’t check the “real” feedback from the calls yet.

Has anyone heard back from support? @kmsbernard or @b.silva ?

kmsbernard · December 30, 2024, 7:44am

I haven’t received any updates from support yet. It’s getting really frustrating.

b.silva · December 30, 2024, 11:31am

I haven’t gotten any feedback yet. I’m experiencing a considerable increase in the cost of using the API and I’m worried about the viability of the operation if it continues like this.

german.sotero · December 30, 2024, 4:03pm

Has anyone gotten an acceptable response to this problem from support other than “Check your application because you’re probably doing something wrong.”?

jim · December 30, 2024, 5:12pm

I got this support request so often in the last 3 months I just gave up reaching out to them (Hint: you’re not doing anything wrong).

Also, I think they’re off for 2 weeks so you might not even hear anything until next week (if at all about this…)

Topic		Replies	Views
Dashboard usage vs Prompt response usage not matching API api-usage , cost , prompt-caching	13	247	January 9, 2025
Realtime API pricing is wrong, will overcharge API realtime	36	2965	January 15, 2025
# of tokens used and costs randomly exploded over night API gpt-35-turbo , chatgpt	26	2423	December 8, 2023
Being overcharged on other models despite seeing 0 usage on dashboard API api-billing-problem	39	1102	January 5, 2025
API Usage Statistics and Credit Balance Not Updating API api-usage	25	1881	October 16, 2024

4o input not being cached

Cache statistics for gpt-4o-2024-08-06:

Cached Tokens Counts:

Cache statistics for gpt-4o-2024-05-13:

Cached Tokens Counts:

Cache statistics for gpt-4o-2024-11-20:

Cached Tokens Counts:

Cache statistics for gpt-4o-mini:

Cached Tokens Counts:

Related topics