Supposing I alternate between two or more prompts, such that every second (or third, etc.) prompt shared the same prefix. Will I get any cache hits? Or do they need to be sequential?
How many prefixes are cached per org, and how many per user?
We can see in the second scenario, we easily expect the 8000 tokens to be the new “cache” that is available, following the pattern of chat.
However, can a smaller cache-based request reset the hits, delivering smaller than those 8000 tokens expected?
Trials of exactly that could answer. You can answer for yourself and others.
The user parameter is not for separating caches distinctly by and per user, whether API key user or end user. It is just used as part of the hashing that can distribute load more evenly across servers for dissimilar requests. You sending a different user ID is equivalent to saying “I don’t expect this to match other caching”.
I’ve tested several concurrent threads with no problem caching.
It works particularly well if you are using previous_response_id for chained prompts.
There is no guarantee though how many attempts it takes to start, it can take a few tries or have a small delay in some cases.
It is worth running some experiments to find what strategy fits you better.