How is prompt_cache_key actually used in API calls?

_j · September 12, 2025, 10:04pm

Written here two days ago:

Essentially: use it, or there is little hope of getting a discount. Plenty of undiscounted gpt-5-mini calls, even running the same input.

It is a top-level API parameter, alongside “model”.

TEST: gpt-4.1-nano, chat completions.

The nonce message was 434 characters long.

input tokens: 1440	output tokens: 9
uncached: 1440	non-reasoning: 9
cached: 0	reasoning: 0

============== RESTART:
The nonce message was 604 characters long.

input tokens: 1440	output tokens: 9
uncached: 160	non-reasoning: 9
cached: 1280	reasoning: 0

The persistence is extremely short when I just tested, not enough time to be typing a serious prompt. (the AI also can’t count the message with 1400 extra characters)

Topic		Replies	Views
Understanding "prompt_cache_keys" in query efficiency API prompt , prompt-caching	5	682	November 12, 2025
Prompt caching with multiple agents API	1	1104	October 9, 2024
Prompt_cache_key seems inconsistent -- works better on GPT-4o than GPT-5 API api	0	116	October 13, 2025
Prompt caching doesn't seem to work regularly API api , prompt-caching	4	725	July 13, 2025
Using same prompt_cache_key in multiple parallel conversations API	3	60	January 11, 2026

How is prompt_cache_key actually used in API calls?

TEST: gpt-4.1-nano, chat completions.

Related topics