Is prompt caching compatible with End-user ID's?

Greetings, everyone. I am endeavoring to determine what is hindering my application from fully utilizing cached inputs. I have an extensive prompt that remains unchanged with each request. As this is my inaugural live application, I am employing end-user IDs for every call, as recommended in https://platform.openai.com/docs/guides/safety-best-practices#end-user-ids. Could this be the underlying issue? Or maybe metadata i’m adding to the request ?
Thanks for your time an replies.

User IDs are actually conducive to hitting cached prompts, according to the prompt caching guide:

  • If you provide the user parameter, it is combined with the prefix hash, allowing you to influence routing and improve cache hit rates. This is especially beneficial when many requests share long, common prefixes.

To hit cached prompts, make sure that the requirements are satisfied.

1 Like

Thanks for your quick feedback. Yes, I have already reviewed the requirements section. At present, I am analyzing the logs, and it appears that the user ID is correlated with both cached and non-cached behaviors, as well as minor variations in the metadata. Please note that I am also transmitting the user ID within the metadata, so it is possible that the issue originates solely from the metadata. Unfortunately, I have not been able to find any information regarding this matter.

Note that sending different user IDs will break caching from different users. A different user ID creates a different hash, signaling that the call can be run against any new datacenter server, not one you recently used that has cached data.

If you have a platform-wide multi-user system message and extended context (such as tools and structured output) large enough to be cached (>1024 tokens), you would want to avoid sending the user API parameter.

With no user parameter sent, then anybody that talks to your personality sets up the cache for the few minutes it persists. You cannot later add a user field, either.

1 Like

Hello _j, and thank you for your message. Essentially, it functions as a search engine for a frequently asked questions page, handling a substantial amount of context—ranging from 20,000 to 50,000 tokens structured outputs. Each visitor is assigned a random user ID that remains consistent throughout their interaction with the assistant. If I understand you correctly, it might be pointless, as the cache would only apply on a per-user basis, correct?

The cache would harshly demoted on a per-user basis.

The exact algorithm for load distribution vs sending to a previous instance with a cache when you have a non-matching hash is not described. It could range from “random” to “avoid existing server”.

It would also take a constant stream of “visitors” to reach some impactful cache at startup, otherwise it would only be their repeated chatting turns that would have caching before expiry.

Going deeper, it is “user” + 256 tokens of input that produces a hash on which the call is routed. If you always have the same huge input as the starting sequence, and the visitors are rapid and varied, without “user”, you would always have routing to the same server unless exceeding about 15 requests per minute, with no actual chat observation by the hashing, just cache of the large knowledge.

1 Like

Thank you once again. Currently, we handle approximately 400 queries per day. Users typically submit only one question, and it is rare for them to ask more than two, so I understand that this results in a substantial number of hashes. As for context, I always include the same information at the beginning of each request, with the user’s question appended at the end. Please note that I incorporated the ‘user ID’ feature based on recommendations I found on the best practices page. Since I also make a preliminary call to a dedicated ‘moderator,’ I may consider removing the user ID, at least temporarily. Thank you again for your guidance.

1 Like