How to improve caching accuracy

Ph_m_H_i_Minh · July 8, 2025, 6:43pm

Currently, I’m having an application that sending around 500 api requests to openAI everyday. There are around 10 types of requests and each type have a constant “System” tagged prompt.

In my request, I separate the system prompt and user request, placing the system prompt first. I’m pretty sure the system prompt is long and more than 1024 tokens since most of the time, even if the user request is really short, the input tokens is around 2k tokens.

However, my cache hit accuracy is almost 0%, if not completely 0. The cache only works if 2 request is exactly the same, even if I modify a tiny bit in user request, it makes the cache miss. Is there any way to improve this? My goal is just to reduce my api cost. I’m using o4-mini btw.

aprendendo.next · July 8, 2025, 7:49pm

I recommend reading the prompt caching guide, it has some valuable tips.

But basically, the first 1024 tokens of your input + user parameter must be identical and the second request must occur in less than 5 minutes (at lower demand hours it may remain longer times, up to 1 hour).

Sometimes it seems to take a bit to take effect, so 2 consecutive fast prompts might get through before caching.

If you have a large system message, it may be interesting to make a separate test with it and monitor the cached tokens to find any unnoticed details that might be breaking your caching.

One detail that might break the cache is changing the instructions parameter midway, as it may change the order of the first 1024 tokens hash. But since you are using it directly as a system role it should be ok.

Topic		Replies	Views
Prompt caching doesn't seem to work regularly API api , prompt-caching	4	86	July 13, 2025
Prompt caching with multiple agents API	1	660	October 9, 2024
How Prompt caching works? API assistants-api , prompt-caching	17	6706	February 4, 2025
Prompt Token Cache Gaming to Save Money? API prompt-caching	1	640	October 18, 2024
Cannot get the client to cache system instructions (100k+ tokens) API	14	170	June 2, 2025

How to improve caching accuracy

Related topics