Why does prompt caching requires at least 1024 tokens?

joyasree78 · October 20, 2025, 4:04am

The document says as below. I am wondering why is this limit required?

API calls to supported models will automatically benefit from Prompt Caching on prompts longer than 1,024 tokens.

PaulBellow · October 20, 2025, 4:25pm

OpenAI doesn’t publicly explain “why,” but if the prompt less than 1024, the potential savings (in latency + compute) might be too small to justify the overhead used for caching. Likely: the 1,024-token rule is a practical engineering cutoff. It keeps the caching system efficient, fair, and worth the effort only when it actually saves time and money.

joyasree78 · October 20, 2025, 7:02pm

Thanks Paul, but my understanding is that this cache is the content of the KV cache which I thought is always created during the prefill phase of inferencing irrespective of the size of the prompt. May be I have some gaps in my understanding.

_j · October 20, 2025, 7:39pm

If your input is “A helpful AI/hello” - which is going to be faster: hashing that, looking up a local context window cache, loading the hidden state and embedding to GPU of a previous run for a resume – or simply to run the AI on 10 tokens of input?

The size of the model and its embeddings and generation rate would give variables to budgeting latency and GPU compute optimization. Simply giving you a token value is a predictable discount cost that would hide proprietary methods. (not predictable is when they discount less than they should, by 128, 192, 256 or more tokens).

Topic		Replies	Views
Since the token limit is 4096 for GPT4, how does 8k and 32k model make a difference? API	4	2425	December 15, 2023
How Prompt caching works? API assistants-api , prompt-caching	17	10007	February 4, 2025
Why don't we have prompt caching on gpt-4? Feedback gpt-4 , prompt-caching	1	197	November 22, 2024
Why is gpt-3.5-turbo-1106 max_tokens limited to 4096? API	3	14344	January 11, 2024
Why does the prompt not respond when I input more than 10,000 characters? Prompting gpt-4	2	1772	May 15, 2023

Why does prompt caching requires at least 1024 tokens?

Related topics