We are seeing what looks like a prompt-caching issue specific to gpt-5.4-nano.
According to the OpenAI docs, Prompt Caching is automatic for recent models and should work for prompts that are >= 1024 tokens. The gpt-5.4-nano model page also lists cached input pricing ($0.02 / 1M), so we expected non-zero cached_tokens / cached input usage.
However, in our tests, gpt-5.4-nano consistently shows zero cache hits, even with long, highly repeated prefixes, while control models on the same gateways do show cache hits.
-
Model: gpt-5.4-nano
-
Repeated the same mood benchmark 3 times with the same long shared prefix
-
Average prompt input per request: 1212.95 tokens
-
Run 1: cached_prompt_input_tokens = 0, cache_hit_rate = 0.00%
-
Run 2: cached_prompt_input_tokens = 0, cache_hit_rate = 0.00%
-
Run 3: cached_prompt_input_tokens = 0, cache_hit_rate = 0.00%
So this does not look like a generic prompt-formatting issue on our side:
-
prompts are above 1024 tokens
-
shared prefixes are stable
-
the same gateways show caching for gpt-5-nano
-
only gpt-5.4-nano is consistently at 0 cached input in our runs
Is prompt caching intentionally disabled for gpt-5.4-nano, or is there a known issue with cache routing / cached-token reporting for this model?