Gpt-5.4-nano appears to return zero prompt-cache hits despite >1024-token shared prefixes

We are seeing what looks like a prompt-caching issue specific to gpt-5.4-nano.

According to the OpenAI docs, Prompt Caching is automatic for recent models and should work for prompts that are >= 1024 tokens. The gpt-5.4-nano model page also lists cached input pricing ($0.02 / 1M), so we expected non-zero cached_tokens / cached input usage.

However, in our tests, gpt-5.4-nano consistently shows zero cache hits, even with long, highly repeated prefixes, while control models on the same gateways do show cache hits.

  • Model: gpt-5.4-nano

  • Repeated the same mood benchmark 3 times with the same long shared prefix

  • Average prompt input per request: 1212.95 tokens

  • Run 1: cached_prompt_input_tokens = 0, cache_hit_rate = 0.00%

  • Run 2: cached_prompt_input_tokens = 0, cache_hit_rate = 0.00%

  • Run 3: cached_prompt_input_tokens = 0, cache_hit_rate = 0.00%

So this does not look like a generic prompt-formatting issue on our side:

  • prompts are above 1024 tokens

  • shared prefixes are stable

  • the same gateways show caching for gpt-5-nano

  • only gpt-5.4-nano is consistently at 0 cached input in our runs

Is prompt caching intentionally disabled for gpt-5.4-nano, or is there a known issue with cache routing / cached-token reporting for this model?