GPT-4o Mini Caching Pricing Issue

hihaai · March 10, 2025, 1:40pm

I am performing a VQA task using GPT-4o Mini.
The only element that should be cached is the system prompt, so I expected minimal caching to occur.
However, I noticed that there are an unusually high number of cached elements.

Looking at the screenshot, it’s clear that GPT-4o is caching reasonable elements.

What’s the issue here? The costs have skyrocketed.

_j · March 11, 2025, 2:37am

Caching is a mechanism saving you money on inputs to API models that are repeated with a majority identical input. $7 billed as cached input would be $14 if that didn’t exist.

The thing that is remarkable here is the ratio of how much input cost you have to how much output is being generated, cached or not.

You say “vision question-answering.” A reminder: GPT-4o-mini costs twice as much for image input as GPT-4o, the opposite of normal expectations, so you cannot expect a ton of images (like those extracted from a video) to be super-cheap like text tokens are on mini. Instead, the cost is increased.

I suspect if you record the usage object being returned from API calls with token usage of each type, you’ll quickly discover what is costing you, and where caching is providing a benefit. Logging all API calls made to a function with the inputs sent, or AI analysis of your code, may reveal a programming error. OpenAI had an issue about a month ago of not billing the 33.33x token cost for mini input images…that was fixed.

Topic		Replies	Views
How to close the API cache pricing? API api	2	307	February 12, 2025
Reasoning tokens hidden price question API api , api-billing , api-billing-problem	5	1147	August 21, 2025
Prompt Caching for o3-mini? API o3-mini	3	630	February 4, 2025
Unexpected Token Discrepancy in GPT-4o Mini Vision Billing vs. API Usage Bugs api	2	515	February 5, 2025
GPT-4.1 vision price calculations -- incorrect billing on full model Bugs bug , gpt-4-vision , gpt-41	7	758	April 24, 2025

GPT-4o Mini Caching Pricing Issue

Related topics