How does Prompt Caching work?

platypus · October 25, 2024, 6:35pm

Yes I completely understand where you are coming from.

In Scenario 3, if you are sending 2560 for system prompt, 2200 for tools and 1500 for user messages, that’s 6260 in total, and you can in the most optimal case expect 1024 + (40 * 128) = 6144 cached tokens. That’s assuming in the subsequent calls you don’t make any changes. Now if you start making changes to the user messages part somewhere in the middle, because KV caching is causal (i.e. only looks at preceding tokens to generate the subsequent ones), then you can be sure that the 2nd half your user messages will be evicted.

Topic		Replies	Views
How Prompt caching works? API assistants-api , prompt-caching	17	10640	February 4, 2025
Prompt caching doesn't seem to work regularly API api , prompt-caching	4	1093	July 13, 2025
Prompt caching with tools API prompt-caching	1	788	September 15, 2025
How does th Prompt Caching Prefix Match work? API prompt-caching	1	667	October 22, 2024
How is prompt_cache_key actually used in API calls? API	4	3700	September 14, 2025

How does Prompt Caching work?

Related topics