Written here two days ago:
Essentially: use it, or there is little hope of getting a discount. Plenty of undiscounted gpt-5-mini calls, even running the same input.
It is a top-level API parameter, alongside “model”.
TEST: gpt-4.1-nano, chat completions.
The nonce message was 434 characters long.
| input tokens: 1440 | output tokens: 9 |
|---|---|
| uncached: 1440 | non-reasoning: 9 |
| cached: 0 | reasoning: 0 |
============== RESTART:
The nonce message was 604 characters long.
| input tokens: 1440 | output tokens: 9 |
|---|---|
| uncached: 160 | non-reasoning: 9 |
| cached: 1280 | reasoning: 0 |
The persistence is extremely short when I just tested, not enough time to be typing a serious prompt. (the AI also can’t count the message with 1400 extra characters)