Can I cache large chunks on gpt-5-nano?, Does each cache-read request reset cache inactive time?, Does large caches affect cache overflow limits?

Hey there,
It is mentioned on the website (https://platform.openai.com/docs/guides/prompt-caching#page-top) that if we are putting in more than say 15 requests per minute, then that would mean that would lead to cache overflow.

I am using gpt-5-nano and my use case involves using a large cache-chunk, about 100k-200k tokens.

I wanted to confirm if sending a large cache chunk say 100k-200k tokens for gpt-5-nano, is:
a) allowed to use for caching (i.e. 200k tokens can be cached without issues)
b) cache request overflow, follows the limit of 15 requests per minute, same as any other cache (being a large chunk it may exceed early?)
c) how is the inactivity time measured? (it says that cache stays for 5-10 minutes of inactivity for peak loads and upto 1hr for off-peak times). My question is that say i send a request at 5:00 PM UTC, which caches the input and a follow up request at 5:04 UTC which works with the same cache, assuming peak load, then will this inactive period reset at 5:04 UTC or will the 5-10 minutes be from 5:00 PM UTC?

1 Like

Welcome to the community.

a) yes b) probably, but it isn’t guaranteed c) last activity

But rather than taking the words of other people on this, I suggest testing with your own data. Some things to notice:

  • Some settings like using the instructions parameter instead of an input role might break the caching on Responses API
  • Responses API is very erratic with caching, failing or taking a few minutes to take effect. Not very deterministic right now. Chat completions API currently offers more stable caching results. You will find several threads about this around here.
  • If saving costs is important for you it might be a good thing to log your requests (at least timestamp, request id and usage), so that you can monitor and check later if caching failed for something you missed or the caching service failed for some of the non deterministic reasons mentioned earlier.

Edit: I’ve noticed caching has been deteriorated even with chat completions lately, so I’ve run some more detailed tests and opened a separated new thread to keep up with this.

1 Like