Understanding "prompt_cache_keys" in query efficiency

_j · September 10, 2025, 2:40pm

Server routing as described:

around 256 tokens + prompt_cache_key field are hashed
this is used for best effort to reuse the same server instance
if it does not match, load distribution is (should be) encouraged instead
one cached host might do 15 calls per minute before you get rolled over to a different non-cache instance anyway (a number that is completely arbitrary and probably fictionalized, because 15 “hi” to gpt-5-nano is not 15 “resolve this QED formula” to o3-pro.)

What it can do is not consume the cache server computation with different calls that start the same.

If you have many users with the same prefix, you have to decide if 1024 tokens of system prompt discount or second turns of chat are more important to discount in common, considering expiry.

Topic		Replies	Views
How is prompt_cache_key actually used in API calls? API	4	3695	September 14, 2025
Using same prompt_cache_key in multiple parallel conversations API	3	589	January 11, 2026
Prompt Cache Routing + the `user` Parameter API prompt-caching	3	741	July 31, 2025
How does Prompt Caching work? Prompting api , prompt-caching	8	9015	October 29, 2024
What does prompt caching store API prompt-caching	1	871	October 11, 2024

Understanding "prompt_cache_keys" in query efficiency

Related topics