Hello. For the past couple days, GPT-4o input tokens are almost never being cached, despite direct sequential calls with large inputs. When I make the exact same calls with o1-mini, the caching works fine. It’d be lovely to have this fixed and save some dollars. Thank you.
what should we do ?
this bug, almost break my business?
we need any body help to solve this problem ?
I have the same problem here, since december 17th caching stopped to work. I don’t know what I should do…
How about 1 million free tokens a day on gpt-4o and o1 series thru February if you can share API calls as training data (unknown if all are eligible, as one must discover this in the platform site).
This may come about with smaller inputs. It seems the amount cached is less than the 128 token steps. 1080 tokens = no cache instead of 1024.
Initial call to gpt-4o-2024-08-06 to populate cache..................................................................................................................................
gpt-4o-2024-08-06 - input tokens: 2491
done.
Initial call to gpt-4o-2024-05-13 to populate cache..................................................................................................................................
gpt-4o-2024-05-13 - input tokens: 2491
done.
Initial call to gpt-4o-2024-11-20 to populate cache..................................................................................................................................
gpt-4o-2024-11-20 - input tokens: 2491
done.
Initial call to gpt-4o-mini to populate cache..................................................................................................................................
gpt-4o-mini - input tokens: 2491
done.
Cache statistics for gpt-4o-2024-08-06:
Total Trials: 3
Cache Hits (cached_tokens > 0): 3
Cache Misses (cached_tokens == 0): 0
Cache Hit Rate: 100.00%
Cache Miss Rate: 0.00%
Cached Tokens Counts:
Cached Tokens Value | Count |
---|---|
2304 | 3 |
Cache statistics for gpt-4o-2024-05-13:
Total Trials: 3
Cache Hits (cached_tokens > 0): 3
Cache Misses (cached_tokens == 0): 0
Cache Hit Rate: 100.00%
Cache Miss Rate: 0.00%
Cached Tokens Counts:
Cached Tokens Value | Count |
---|---|
2304 | 3 |
Cache statistics for gpt-4o-2024-11-20:
Total Trials: 3
Cache Hits (cached_tokens > 0): 3
Cache Misses (cached_tokens == 0): 0
Cache Hit Rate: 100.00%
Cache Miss Rate: 0.00%
Cached Tokens Counts:
Cached Tokens Value | Count |
---|---|
2304 | 3 |
Cache statistics for gpt-4o-mini:
Total Trials: 3
Cache Hits (cached_tokens > 0): 3
Cache Misses (cached_tokens == 0): 0
Cache Hit Rate: 100.00%
Cache Miss Rate: 0.00%
Cached Tokens Counts:
Cached Tokens Value | Count |
---|---|
2304 | 3 |
Oh! I thought this was just me…having the same exact issue pop up. Input tokens not caching anymore (though my date is the 10th), except maybe a penny a day.
Are you using assistants api?
I’m not using the Assistants API. Just chat completions with around 40k to 100k context, 95%+ identical from the start. Inputs were, thus, mostly cached until a few days ago and since has been nada. I hope they resolve this as it’s gotten noticeably more expensive.
I’m using a combination of both.
Caching effectively disappeared for me this month. Though I feel like this happened before, and then one day when I logged back in they had backfilled all the caching…
I am experiencing the same issue with the gpt-4o-mini-2024-07-18 model, which started occurring around mid-December.
Here is an example of the metadata output of an api call:
{
"prompt_tokens": 11505,
"completion_tokens": 92,
"total_tokens": 11597,
"prompt_tokens_details": {
"cached_tokens": 11008,
"audio_tokens": 0
},
"completion_tokens_details": {
"reasoning_tokens": 0,
"audio_tokens": 0,
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0
}
}
The output indicates that over 90% of the input tokens are cached, yet the dashboard reports a uncached-to-cached usage ratio of 30:1. This discrepancy is unclear to me.
The input remains unchanged over consecutive calls that appear within seconds after each other. Also, the first call shows 0 cached tokens and only the api calls after that show caching. Everything as expected - except the dashboard (and also cost tracking) showing no caching.
I would appreciate any assistance in resolving this issue!
I’m trying contact with the support, but they keep asking me if we made some change in our app.
I created a test app with the same prompt to check usage and found out the cache is being generated only in the same thread, if thread changes it won’t work.
When the prompt is on the same thread it generates cached tokens but won’t reflect to our “usage dashboard”
I am also experiencing the same issue , which has been occurring since December 20th. I have already contacted support. I recommend that others facing the same issue should contact support with the URL of this thread.
I’ve already contacted support, I’m still waiting for a response.
I’m experiencing the same issue and have also contacted support.
They mentioned that they’re investigating a bug causing the usage dashboard to display data incorrectly or out of sync with the actual API usage. They’re currently working on a fix.
Same problem here, we have a prod server with very regular queries sharing 90% of their system prompt. (5k+ tokens).
Nothing changed the 19th of December but caching stopped appearing at least in the Usage Dashboard. I didn’t check the “real” feedback from the calls yet.
Has anyone heard back from support? @kmsbernard or @b.silva ?
I haven’t received any updates from support yet. It’s getting really frustrating.
I haven’t gotten any feedback yet. I’m experiencing a considerable increase in the cost of using the API and I’m worried about the viability of the operation if it continues like this.
Has anyone gotten an acceptable response to this problem from support other than “Check your application because you’re probably doing something wrong.”?
I got this support request so often in the last 3 months I just gave up reaching out to them (Hint: you’re not doing anything wrong).
Also, I think they’re off for 2 weeks so you might not even hear anything until next week (if at all about this…)