How is prompt_cache_key actually used in API calls?

Written here two days ago:

Essentially: use it, or there is little hope of getting a discount. Plenty of undiscounted gpt-5-mini calls, even running the same input.

It is a top-level API parameter, alongside “model”.


TEST: gpt-4.1-nano, chat completions.

The nonce message was 434 characters long.

input tokens: 1440 output tokens: 9
uncached: 1440 non-reasoning: 9
cached: 0 reasoning: 0

============== RESTART:
The nonce message was 604 characters long.

input tokens: 1440 output tokens: 9
uncached: 160 non-reasoning: 9
cached: 1280 reasoning: 0

The persistence is extremely short when I just tested, not enough time to be typing a serious prompt. (the AI also can’t count the message with 1400 extra characters)