Understanding "prompt_cache_keys" in query efficiency

_j · November 12, 2025, 5:56am

A cache is local to an instance of a “server” or whatever size of local “compute units” run and share and retain data.

Case against response ID helping cache

A KV context window cache is specific to a particular AI model, whereas you can use previous_response_id with a different AI model in the next turn. This internal state also could be quite large. Therefore: it is unlikely the cache ever becomes part of an account database anywhere to be retrieved.

Others have a cache dB

Google lets you store your own persistent cache by deliberate effort, and specify it by ID. Think: OpenAI’s “prompts presets”, but with a built-in discount.

When to employ the parameter

Besides your focus on caching “app” vs caching “chats”; the big thing that I think is important:

You can penalize your performance if you have 256+ tokens in common with other calls (which is even just the size of the “vision safety message” that OpenAI injects), thus sending everything to the same instance. Without differentiation by changing input, or indeed this parameter to break the routing algorithm, the “hash” determines that same-server routing is preferable - even though you’d never realize a discount.

I haven’t tested this particular case of breaking common patterns with prompt_cache_key to ensure load distribution (vs just making the calls with no caching thought), to benchmark and see speed differences quantitatively in batches of runs, but it seems logical that varying the parameter ensures you don’t concentrate your computation in one place, to bog down inference if OpenAI’s “rollover” is naive.

I did note early on that inputs that could be cacheable, but were below the discount threshold, had a similar performance penalty (or advantage?) to discount-sized calls. My inference then was not about the routing, but that OpenAI might enjoy cache optimization and simply not share the discount - and had similar conclusion before the discount was even introduced.

Topic		Replies	Views
How is prompt_cache_key actually used in API calls? API	4	2449	September 14, 2025
Using same prompt_cache_key in multiple parallel conversations API	3	181	January 11, 2026
Prompt Token Cache Gaming to Save Money? API prompt-caching	2	968	February 12, 2026
Prompt Cache Routing + the `user` Parameter API prompt-caching	3	610	July 31, 2025
How does Prompt Caching work? Prompting api , prompt-caching	8	8122	October 29, 2024

Understanding "prompt_cache_keys" in query efficiency

Case against response ID helping cache

Others have a cache dB

When to employ the parameter

Related topics