I’m relying on prompt caching heavily (according to the API response, 90% of my tokens are cached).
When I log the usage from the API responses, I get total tokens to around 50k and cached tokens at 49k. (and then I run many many prompts like that of course).
However, when I look at the OpenAI Usage dashboard, I see “uncached input pay as you go” at say 10 million tokens (for GPT4o-mini 2024-07-18). But cached input is only at like 500k.
The cached input being that low makes no sense at all if I look at my API response of usage and cached usage - if anything the OAI dashboard is inverted.
So just asking, how does the dashboard calculate the usage and how do I reconcile it with what is visible on my end from the API response?
Prompt caching seems to currently be borked, there’s a bunch of threads on this
OpenAI status (https://status.openai.com/) seems to think everything is working as normal, and I don’t see any OAI employee activity in these threads unless I’ve missed something.
Soo…
Sorry there’s not much anyone can do other than wait (or migrate to Azure) I guess
Think this might also be related to how the usage response from completion calls account for image tokens. In my experience, image tokens are not being included in the response, so if you are using images and these are not getting cache hits, then it would make sense that the api response says you have a high cache hit rate but the dashboard says otherwise.
Check out the organization.usage endpoint! Might get you some better data there.
Polling Usage API for completions object for a single day (https://api.openai.com/v1/organization/usage/completions ) we see zero for input_cached_tokens , yet the completion object returns for those day show cached tokens (saved when streamed or returned with
For reference: Just checked numbers for one day, yesterday Jan. 6, for 24 hours, and checking against usage on Dashboard:
For gpt-4o-2024-11-20 running predominantly on Assistants API:
input usage cost is 1.7x what it should be, no cached recorded
output usage is close to 1x
I do note that o1-2024-12-17 has cached input recorded.
The numbers listed on the Usage Dashboard are accurate if no cached input is taken into account for gpt-4o-2024-11-20.
I’d be OK with it if it was just a reporting/dashboard issue, but I was charged for December the same amount as reported in the dashboard for that month.
This has been brought up with staff and they are looking into this.
Thanks for confirming that the credits deducted are not aligned with a proper cache hit!
I just came to the forum to report the exact same issue. My responses are reporting a VERY good cache hit rate (like 93%), but in my billing dashboard almost nothing is going to the caching category. NOT none - but like 3 cents.
Good to hear I’m not crazy. Been doubting my math for the past 30 minutes.
(Would be really great to get that cost reimbursed! I’m racking up quite the bill right now, lol.)
I think the issue has now been rectified-- the input tokens from the chat completion object match the dashboard numbers. Thanks for solving the issue! It would be great if the extra credits charged from previous months could be restored!
Here is a post from another user who received a response from support stating that OpenAI is investigating the issue of overcharging:
I will now close this topic. Please continue the conversation in the thread linked above. This will help maintain a clear overview of how the situation is progressing.