Consistent cache breaks with o4-mini and previous_response_id

_j · May 6, 2025, 8:16am

Caching is based on input. Matching against prior inputs, not assistant outputs. So if you were to never send back a reasoning (or employ by its reasoning id), you should be building on an input that doesn’t have any breaks. reasoning + assistant as new input should be similar to assistant only (with much faster context growth for dubious benefit except that it continues to retrain the AI).

It would be lookback meddling that could give a cache break. Someone’s decision, “we’ll only keep a few reasoning outputs”. But that should not behave as an all-or-none like you show in your last log; eventually chat would grow enough that there’d be some reuse.

The best design would be that response ID itself is a k-v cache, not just messages. The message list is not mutable, which facilitates that. However, one could guess that the state data of a model is much larger than its possible context tokens as input, impractical infrastructure.

What’s a hit…

Caching is described as “best effort to route you to be serviced by same inference server” or similar language.

How would that be done (and how on your thousands of calls a minute)?

Could it be that there’s some quick hashing of inputs, finding that initial commonality, for routing? But - that using a previous response ID in a Responses API call is not offering “input” inspectable at that layer? One can only speculate in the dark, but such an architecture to service API calls might also be a reason for low cache hits.

Response ID, you’d think, instead, would be an easy path back to a cache waiting on an inference server.

Topic		Replies	Views
Is there a way to disable prompt caching in the APIs API prompt-caching	9	4036	April 24, 2025
Cache not caching more than 1024 tokens (expected: increments of 128 tokens) Bugs prompt-caching	6	189	November 14, 2024
How Prompt caching works? API assistants-api , prompt-caching	17	5555	February 4, 2025
Prompt Caching for o3-mini? API o3-mini	3	442	February 4, 2025
Unexpected Model Behavior When Using previous_response_id in Responses API API	3	319	April 24, 2025

Consistent cache breaks with o4-mini and previous_response_id

What’s a hit…

Related topics