Prompt_cache_key seems inconsistent -- works better on GPT-4o than GPT-5

I’ve been testing prompt caching with the Responses API and noticed prompt_cache_key behaves inconsistently. With GPT-4o, I often get many cached tokens on repeated runs, but with GPT-5-mini I rarely see any, even with identical prompts and the same cache key.

Here’s a minimal example:

class Command(BaseCommand):
    def handle(self, *args, **options) -> None:
        client = OpenAI(api_key=settings.OPENAI_API_KEY)

        large_block = "abcdefghijkl1 " * 2048

        input_list: ResponseInputParam = [
            {"role": "system", "content": large_block},
        ]

        prompt_cache_key = hashlib.sha256(large_block.encode("utf-8")).hexdigest()[:64]
        input_list.append({"role": "user", "content": "Hello"})

        cached_tokens_1 = self.call_openai(client, input_list, prompt_cache_key)

        print(f"First call cached tokens: {cached_tokens_1}")

        time.sleep(2)
        input_list.append({"role": "user", "content": "Hello again"})

        cached_tokens_2 = self.call_openai(client, input_list, prompt_cache_key)
        print(f"Second call cached tokens: {cached_tokens_2}")

    def call_openai(
        self,
        client: OpenAI,
        input_list: ResponseInputParam,
        prompt_cache_key: str,
    ) -> int:
        accumulated_assistant = ""
        cached_tokens = 0

        stream = client.responses.create(
            model="gpt-4o-2024-11-20",
            # model="gpt-5-mini",
            input=input_list,
            stream=True,
            prompt_cache_key=prompt_cache_key,
        )

        for event in stream:
            if isinstance(event, ResponseTextDeltaEvent):
                accumulated_assistant += event.delta

            elif isinstance(event, ResponseCompletedEvent):
                response = event.response
                usage = response.usage.model_dump() if response.usage else {}
                cached_tokens = usage.get("input_tokens_details", {}).get(
                    "cached_tokens", 0
                )
                break

        if accumulated_assistant.strip():
            input_list.append({"role": "assistant", "content": accumulated_assistant})

        return cached_tokens

Example (GPT-4o):

First call cached tokens: 0
Second call cached tokens: 10112

If I run it again, most of the time I get full cached tokens on both calls.

However, when running this with a GPT-5 model, it’s almost always the same (note that I changed the large_block to get a new prompt_cache_key when switching models – not sure if that matters though):

Example (GPT-5-mini):

First call cached tokens: 0
Second call cached tokens: 0

Is this a known issue? Are GPT-5 models just more inconsistent or have weaker support for caching?

Worth mentioning that it’s still not 100% consistent with GPT-4o; sometimes I also get 0 cached tokens. But on average, GPT-4o gives much better results than the GPT-5 models.

Thanks!