Prompt caching doesn't seem to work regularly

Avik_Halder · July 12, 2025, 7:07am

I am trying to maintain multi-turn conversation by appending the user questions and the assistant response in an incremental manner in the loop, using gpt-4o, but the usage logs are trying to say something else, it seems as if the amount of cached tokens taking a hit is pretty irregular. Please can anybody help…!!!

aprendendo.next · July 12, 2025, 10:04am

Prompt caching depends basically on:

The hash of the [first 1024 input tokens] + [user parameter] must match
It has to hit in less than 5 minutes interval (up to 1 hour depending on service demands)
It may have a little delay to start taking effect
For more details, there is a guide for prompt caching.

Avik_Halder · July 13, 2025, 6:42am

Thanks a lot for the guide, I was wondering this - if it depends on the hash, I mean it is not always likely to hit in the same machine where in the previous step I might have landed, right? that’s why it is not taking a hit always, and in the middle sometimes, it just lands up at the correct machine, right? Just wanted to confirm…

aprendendo.next · July 13, 2025, 12:42pm

In the guide it says:

Cache Routing:

Requests are routed to a machine based on a hash of the initial prefix of the prompt. The hash typically uses the first 256 tokens, though the exact length varies depending on the model.

If you provide the user parameter, it is combined with the prefix hash, allowing you to influence routing and improve cache hit rates. This is especially beneficial when many requests share long, common prefixes.

If requests for the same prefix and user combination exceed a certain rate (approximately 15 requests per minute), some may overflow and get routed to additional machines, reducing cache effectiveness.

mat.eo · July 13, 2025, 4:33pm

To add, I’ve found that some people send dynamic objects (such as a class with a timestamp factory) to the endpoint, which ruins any chance of prompt caching. Not the case here as it looks to be intermittent. Just pointing out that sometimes the hash can unexpectedly change

It would be nice if OpenAI - or anybody really - released a simple helper function that returned the hash of the tokens.

Topic		Replies	Views
How Prompt caching works? API assistants-api , prompt-caching	17	7079	February 4, 2025
How does Prompt Caching work? Prompting api , prompt-caching	8	4524	October 29, 2024
How to improve caching accuracy API api , prompt , prompt-caching	1	76	July 8, 2025
Prompt caching with multiple agents API	1	713	October 9, 2024
Cache not caching more than 1024 tokens (expected: increments of 128 tokens) Bugs prompt-caching	6	219	November 14, 2024

Prompt caching doesn't seem to work regularly

Related topics