Prompt Cache Routing + the `user` Parameter

jeffsharris · May 23, 2025, 7:05pm

We just updated our prompt caching guide with details of how cache routing works!

We route prompts by org, hashing the first ~256 tokens, and spillover to more machines ~15 RPM. If you’ve got many prompts with long shared prefixes, the user parameter can improve request bucketing and boost cache hits

vb · May 23, 2025, 7:57pm

That’s great news!
The lower amount of tokens needed for a cache hit is welcome. Thank you for that!

Can you share any specifics by how much the hit rate will improve when using the user parameter?

_j · May 23, 2025, 8:07pm

Interesting, as I just wrote yesterday:

The most important takeaway: “user” can break caching, outside of the input context.

So: were you to have a platform with a common agent used by many user, built to bring it within a cacheable length, sending the user parameter for “user tracking for OpenAI safety” (documented but never seeming to matter), you are breaking your own discount opportunity that you have engineered.

However, if those following tokens, after cacheable start, diverge by different users, the routing will not detect that, so “user” is a benefit that will cache later instead of initially - supposing that user continues interactions in a short period.

A discount does not take effect any earlier - it is all about understanding this mechanism against your own patterns - how you will distribute between servers by your own indirect management instead of blind distribution after a rate is hit.

_j · July 31, 2025, 9:51am

New API parameters today, replacing “user”

Separating out:

signaling for cache routing
end-user tracking by OpenAI

prompt_cache_key: str

Used by OpenAI to cache responses for similar requests to optimize your cache
hit rates. Replaces the `user` field.

Learn more.

and

safety_identifier: str

A stable identifier used to help detect users of your application that may be
violating OpenAI's usage policies. The IDs should be a string that uniquely
identifies each user. We recommend hashing their username or email address, in
order to avoid sending us any identifying information.

Learn more.

Sending these parameters will require updating SDK libraries from openai. Then adapting the purpose to the patterns you make.

Use-case

You have a large cacheable system prompt >1024 tokens, used across many users. (Your support bot with lots of company knowledge, for example).

Solution

Use both new parameters for the described purpose, prompt_cache_key in agreement with the first 256 tokens determining a desire for best effort to route to hash-indicated inference server.

Use-case

You still have a large cacheable system prompt >1024 tokens, used across many users. (Your support bot with lots of company knowledge, for example). But you find out for your high volume greater than 15 requests per minute, you have many cache misses for long chats delivering discount shorter than the chat context window

Solution

Prefer a prompt_cache_key that is user-based, so their conversations and not the common prompt take priority.

Use-case

You have a small system prompt or tools in common, used across users, 256-1023 tokens, after which chats diverge per-user.

Solution

Break the indication of routing to the same server by prompt hash alone. You’d get no discount on only 500 tokens in commmon, but the same “hash” would result. Provide a per-user prompt_cache_key to have your application distributed, but encourage per-user caching of chats.
Avoids the application’s initial 256 context window doing the routing, saturating the cache server but missing, and rolling over.

Use-case

You want to benchmark uncached results, simulating unconnected calls

Solution

hashing of prompt_cache_key and context does not seem to guarantee distribution across different inference servers
thus, add a small variation or crypto nonce that does not distract the AI within the first 128 block (we’d also break any cache OpenAI does without delivering the discount), such as “Chat session id: e3f9\n”

Use-case

You don’t know why sending a user ID would ever be useful, and have heard zero anecdote of it ever being used to mitigate organization bans or inform of a bad user.

Solution

Send or don’t send safety_identifier, up to you.

Topic		Replies	Views
Using same prompt_cache_key in multiple parallel conversations API	3	179	January 11, 2026
Is prompt caching compatible with End-user ID's? API	6	232	July 17, 2025
Understanding "prompt_cache_keys" in query efficiency API prompt , prompt-caching	5	927	November 12, 2025
How is prompt_cache_key actually used in API calls? API	4	2417	September 14, 2025
Prompt caching (automatic!) Announcements	19	5410	October 9, 2024

Prompt Cache Routing + the `user` Parameter

New API parameters today, replacing “user”

Separating out:

Use-case

Solution

Use-case

Solution

Use-case

Solution

Use-case

Solution

Use-case

Solution

Related topics