Prompt Caching for o3-mini?

greendsys · February 4, 2025, 9:10pm

Does prompt caching have anything to do with the development of the AI model itself, or is it for profitability purposes that o3-mini isn’t being offered with caching?

As far as I know from the chatbots there’s basic RAG or KV querying technics that could give these some sort of caching capabilities to chat. Open AI already has the technology established for the other models.

So at this cash burning rate I’m questioning my choices
400K input, 15K output => 50 cents already in 10 messages back and forth…
Piles up quickly, you see…

edwinarbus · February 4, 2025, 9:17pm

o3-mini does work with prompt caching! 50% discount on input tokens.

greendsys · February 4, 2025, 9:34pm

Whoops, I apologize for the new topic. You’re right. It’s probably that Roo code extension doesn’t follow the pricing structure and gave me the uncached prices per usage. Sorry, my bad.

_j · February 4, 2025, 11:27pm

There is another aspect to consider: You will only receive a cached discount on the initial input context when:

Input messages are losslessly chat-like, identical to a previous call, from the start;
the inference reaches the same server, prioritized but not guaranteed;
the “identical” portion is greater than 1024 tokens, tokenized in the same manner;
the cache has both been established (not paralllel calls) and is not expired (5-60 minutes).

The proportion paid on unrepeatable reasoning output can be far higher than input you actually resend and grow where a cache discount can be activated. For example, here is 48 tokens “in”, 3000 tokens “out” of o3-medium:

The model doesn’t need as much prompt description of methods to arrive at a solution, as it generates its own - each time.

You can capture the full “usage” returned by an API chat completions call response to get a per-call report on “cached”.

Topic		Replies	Views
Prompt caching enabled in O3-Mini? API prompt-caching , o3-mini	5	247	March 31, 2025
Prompt caching (automatic!) Announcements	19	3881	October 9, 2024
Prompt Token Cache Gaming to Save Money? API prompt-caching	1	663	October 18, 2024
Respones API - how does prompt caching work and its cost implications API	3	93	July 4, 2025
How Prompt caching works? API assistants-api , prompt-caching	17	6924	February 4, 2025

Prompt Caching for o3-mini?

Related topics