Prompt Cache Routing + the `user` Parameter

We just updated our prompt caching guide with details of how cache routing works!

We route prompts by org, hashing the first ~256 tokens, and spillover to more machines ~15 RPM. If you’ve got many prompts with long shared prefixes, the user parameter can improve request bucketing and boost cache hits

7 Likes

That’s great news!
The lower amount of tokens needed for a cache hit is welcome. Thank you for that!

Can you share any specifics by how much the hit rate will improve when using the user parameter?

2 Likes

Interesting, as I just wrote yesterday:

The most important takeaway: “user” can break caching, outside of the input context.

So: were you to have a platform with a common agent used by many user, built to bring it within a cacheable length, sending the user parameter for “user tracking for OpenAI safety” (documented but never seeming to matter), you are breaking your own discount opportunity that you have engineered.

However, if those following tokens, after cacheable start, diverge by different users, the routing will not detect that, so “user” is a benefit that will cache later instead of initially - supposing that user continues interactions in a short period.

A discount does not take effect any earlier - it is all about understanding this mechanism against your own patterns - how you will distribute between servers by your own indirect management instead of blind distribution after a rate is hit.