Can I cache large chunks on gpt-5-nano?, Does each cache-read request reset cache inactive time?, Does large caches affect cache overflow limits?

Welcome to the community.

a) yes b) probably, but it isn’t guaranteed c) last activity

But rather than taking the words of other people on this, I suggest testing with your own data. Some things to notice:

  • Some settings like using the instructions parameter instead of an input role might break the caching on Responses API
  • Responses API is very erratic with caching, failing or taking a few minutes to take effect. Not very deterministic right now. Chat completions API currently offers more stable caching results. You will find several threads about this around here.
  • If saving costs is important for you it might be a good thing to log your requests (at least timestamp, request id and usage), so that you can monitor and check later if caching failed for something you missed or the caching service failed for some of the non deterministic reasons mentioned earlier.

Edit: I’ve noticed caching has been deteriorated even with chat completions lately, so I’ve run some more detailed tests and opened a separated new thread to keep up with this.

1 Like