Prompt caching with tools

Pawe · September 10, 2025, 9:43pm

We’ve seen some inconsistent caching behavior and wanted to ask questions to better understand, especially regarding tool caching. Combed through past posts and haven’t seen answers to these.

When sending both a prompt and a tools list, what is cached first the prompt or the tools list? Is there any way to influence this? For example it would be great to cache the system prompt first (never changes) and the tools list second (most of it doesn’t change but sometimes the last tools might vary) and user prompt last (always changes).
Anthropic and Google allow more fine grained controls on what is cached with cache_start and cache_end tokens in the prompt. Is this on OAI’s roadmap?
The prompt_cache_key, if I am reading the caching documentation correctly, is appended first and then the prompt. So the routing is based on the first 256 tokens of prompt_cache_key+prompt (or prompt_cache_key+tools, depending on the answer to question 1 above yes?)
Cache hits are in increments of 256 starting at 1024. So does this mean that the 256 token prefix is first used to route to a machine. Then the machine must have stored hashes of every prompt during the last 5-10 minutes at 256 token increments. The cache is then queried for the first 1024 key, if that hits then it’s queried for the hash of the first 1280 tokens, etc until it misses? The greatest hit is then used? If not then how does it assure the greatest match in increments of 1024+256 gets used?
Per the documentation routing is always to a machine based on the first 256 token hash. Does this literally mean that just one machine stores cash for all prefixes starting with those 256 tokens? The cache isn’t shared at any level between pools of machines?

Thanks!!!

Pawe · September 15, 2025, 6:20am

From some experimenting, to answer some of these questions,

tools are always sent first and then the prompt, there is no way to control this
prompt_cache_key only works with gpt-5 and gpt-5-mini (but not nano and not any of the earlier models)
for earlier models or without prompt_cache_key cache is inconsistent so I guess it’s not a deterministic route to one machine or a shared cached between machines. I.e. the query doesn’t always get routed to the machine/machines that have the cache (all just a guess based on what I’m seeing)

Topic		Replies	Views
How does Prompt Caching work? Prompting api , prompt-caching	8	7955	October 29, 2024
Prompt caching doesn't seem to work regularly API api , prompt-caching	4	809	July 13, 2025
How is prompt_cache_key actually used in API calls? API	4	2249	September 14, 2025
Prompt caching with multiple agents API	1	1153	October 9, 2024
Understanding "prompt_cache_keys" in query efficiency API prompt , prompt-caching	5	867	November 12, 2025

Prompt caching with tools

Related topics