We just updated our prompt caching guide with details of how cache routing works!
We route prompts by org, hashing the first ~256 tokens, and spillover to more machines ~15 RPM. If you’ve got many prompts with long shared prefixes, the user parameter can improve request bucketing and boost cache hits
The most important takeaway: “user” can break caching, outside of the input context.
So: were you to have a platform with a common agent used by many user, built to bring it within a cacheable length, sending the user parameter for “user tracking for OpenAI safety” (documented but never seeming to matter), you are breaking your own discount opportunity that you have engineered.
However, if those following tokens, after cacheable start, diverge by different users, the routing will not detect that, so “user” is a benefit that will cache later instead of initially - supposing that user continues interactions in a short period.
A discount does not take effect any earlier - it is all about understanding this mechanism against your own patterns - how you will distribute between servers by your own indirect management instead of blind distribution after a rate is hit.