Hey there,
It is mentioned on the website (https://platform.openai.com/docs/guides/prompt-caching#page-top) that if we are putting in more than say 15 requests per minute, then that would mean that would lead to cache overflow.
I am using gpt-5-nano and my use case involves using a large cache-chunk, about 100k-200k tokens.
I wanted to confirm if sending a large cache chunk say 100k-200k tokens for gpt-5-nano, is:
a) allowed to use for caching (i.e. 200k tokens can be cached without issues)
b) cache request overflow, follows the limit of 15 requests per minute, same as any other cache (being a large chunk it may exceed early?)
c) how is the inactivity time measured? (it says that cache stays for 5-10 minutes of inactivity for peak loads and upto 1hr for off-peak times). My question is that say i send a request at 5:00 PM UTC, which caches the input and a follow up request at 5:04 UTC which works with the same cache, assuming peak load, then will this inactive period reset at 5:04 UTC or will the 5-10 minutes be from 5:00 PM UTC?