Prompt Cache Routing + the `user` Parameter

We just updated our prompt caching guide with details of how cache routing works!

We route prompts by org, hashing the first ~256 tokens, and spillover to more machines ~15 RPM. If you’ve got many prompts with long shared prefixes, the user parameter can improve request bucketing and boost cache hits

7 Likes

That’s great news!
The lower amount of tokens needed for a cache hit is welcome. Thank you for that!

Can you share any specifics by how much the hit rate will improve when using the user parameter?

2 Likes

Interesting, as I just wrote yesterday:

The most important takeaway: “user” can break caching, outside of the input context.

So: were you to have a platform with a common agent used by many user, built to bring it within a cacheable length, sending the user parameter for “user tracking for OpenAI safety” (documented but never seeming to matter), you are breaking your own discount opportunity that you have engineered.

However, if those following tokens, after cacheable start, diverge by different users, the routing will not detect that, so “user” is a benefit that will cache later instead of initially - supposing that user continues interactions in a short period.

A discount does not take effect any earlier - it is all about understanding this mechanism against your own patterns - how you will distribute between servers by your own indirect management instead of blind distribution after a rate is hit.

New API parameters today, replacing “user”

Separating out:

  • signaling for cache routing
  • end-user tracking by OpenAI

prompt_cache_key: str

Used by OpenAI to cache responses for similar requests to optimize your cache
hit rates. Replaces the `user` field.

Learn more.

and

safety_identifier: str

A stable identifier used to help detect users of your application that may be
violating OpenAI's usage policies. The IDs should be a string that uniquely
identifies each user. We recommend hashing their username or email address, in
order to avoid sending us any identifying information.

Learn more.

Sending these parameters will require updating SDK libraries from openai. Then adapting the purpose to the patterns you make.

Use-case

  • You have a large cacheable system prompt >1024 tokens, used across many users. (Your support bot with lots of company knowledge, for example).

Solution

  • Use both new parameters for the described purpose, prompt_cache_key in agreement with the first 256 tokens determining a desire for best effort to route to hash-indicated inference server.

Use-case

  • You still have a large cacheable system prompt >1024 tokens, used across many users. (Your support bot with lots of company knowledge, for example). But you find out for your high volume greater than 15 requests per minute, you have many cache misses for long chats delivering discount shorter than the chat context window

Solution

  • Prefer a prompt_cache_key that is user-based, so their conversations and not the common prompt take priority.

Use-case

  • You have a small system prompt or tools in common, used across users, 256-1023 tokens, after which chats diverge per-user.

Solution

  • Break the indication of routing to the same server by prompt hash alone. You’d get no discount on only 500 tokens in commmon, but the same “hash” would result. Provide a per-user prompt_cache_key to have your application distributed, but encourage per-user caching of chats.
  • Avoids the application’s initial 256 context window doing the routing, saturating the cache server but missing, and rolling over.

Use-case

  • You want to benchmark uncached results, simulating unconnected calls

Solution

  • hashing of prompt_cache_key and context does not seem to guarantee distribution across different inference servers
  • thus, add a small variation or crypto nonce that does not distract the AI within the first 128 block (we’d also break any cache OpenAI does without delivering the discount), such as “Chat session id: e3f9\n”

Use-case

  • You don’t know why sending a user ID would ever be useful, and have heard zero anecdote of it ever being used to mitigate organization bans or inform of a bad user.

Solution

  • Send or don’t send safety_identifier, up to you.
2 Likes