Cache not caching more than 1024 tokens (expected: increments of 128 tokens)

mr.shalex · November 14, 2024, 4:22pm

I experience the same behavior:

I had to increase the size of my initial prompt to 1200 tokens so that the caching started to work, and my next round-trip to gpt-4o-2024-08-06 returned a chat completion object with “prompt_tokens_details”: {“cached_tokens”: 1024}. It looks like gpt-4o tokenizer and gpt-4o-2024-08-06 use different encodings although according to docs they both should work with the same o200k_base.
Documentation says:

Cached prefixes generally remain active for 5 to 10 minutes of inactivity. However, during off-peak periods, caches may persist for up to one hour.

In most cases, the prompt is actually cached. And I see in logs that even new sessions within 10 minutes reuse the cache. But sometimes within the same session the initial prompt is not cached, and the next round-trip within 30 seconds doesn’t pick up the cache. I don’t understand why.

Topic		Replies	Views
How does Prompt Caching work? Prompting api , prompt-caching	8	6143	October 29, 2024
How Prompt caching works? API assistants-api , prompt-caching	17	8355	February 4, 2025
Identical request input results in different input token counts in the dashboard API token	11	678	October 15, 2024
Is this a problem with cached tokens? API gpt-4 , prompt-caching	3	1251	October 10, 2024
Get all requested max tokens with gpt-3.5-turbo-instruct API gpt-35-turbo-instruc	20	7513	January 21, 2024

Cache not caching more than 1024 tokens (expected: increments of 128 tokens)

Related topics