Prompt caching - how many prompts are cached?

zingbat · June 20, 2025, 11:32am

Supposing I alternate between two or more prompts, such that every second (or third, etc.) prompt shared the same prefix. Will I get any cache hits? Or do they need to be sequential?

How many prefixes are cached per org, and how many per user?

Thanks.

_j · June 20, 2025, 5:13pm

This would take a series of calls just as you describe to figure out, as the behavior is not documented in such a case.

Scenario 1:

setup request 1: 8000 tokens common
setup request 2: 1500 tokens common (or a bit more, activating cache)

Scenario 2:

setup request 1: 1500 tokens common
setup request 2: 8000 tokens, growing

We can see in the second scenario, we easily expect the 8000 tokens to be the new “cache” that is available, following the pattern of chat.

However, can a smaller cache-based request reset the hits, delivering smaller than those 8000 tokens expected?

Trials of exactly that could answer. You can answer for yourself and others.

The user parameter is not for separating caches distinctly by and per user, whether API key user or end user. It is just used as part of the hashing that can distribute load more evenly across servers for dissimilar requests. You sending a different user ID is equivalent to saying “I don’t expect this to match other caching”.

aprendendo.next · June 20, 2025, 11:25pm

I don’t think there is a limit, but it depends on what you are doing.

It was published recently a series of details on how prompt caching works, it is worth reading.

I’ve tested several concurrent threads with no problem caching.
It works particularly well if you are using previous_response_id for chained prompts.

There is no guarantee though how many attempts it takes to start, it can take a few tries or have a small delay in some cases.

It is worth running some experiments to find what strategy fits you better.

Topic		Replies	Views
How does th Prompt Caching Prefix Match work? API prompt-caching	1	396	October 22, 2024
Prompt caching with multiple agents API	1	937	October 9, 2024
Save prompt till session expires API	1	739	June 5, 2021
How does Prompt Caching work? Prompting api , prompt-caching	8	6387	October 29, 2024
Prompt caching with Streaming Documentation prompt , api-usage , gpt-4o-mini	2	474	October 2, 2024

Prompt caching - how many prompts are cached?

Related topics