Issues with gpt-5 caching

YoGoUrT · August 28, 2025, 9:38pm

GPT-5 prompt caching is inconsistent across identical requests

Identical requests to the GPT-5 API show inconsistent prompt caching. On the first call usage.input_tokens_details.cached_tokens is 0, the second call shows roughly half of total input tokens cached, and subsequent calls fluctuate. Sometimes no caching occurs; other times more tokens are cached than in the previous call. usage.input_tokens does not consistently reflect the expected reduction from caching.

How to reproduce:

Call the api once:

curl --request POST \
--url https://api.openai.com/v1/responses \
--header "Authorization: Bearer $OPENAI_API_KEY" \
--header 'Content-type: application/json' \
--data '{
  "model": "gpt-5",
  "input": "(Text exceeding 1024 tokens to allow caching)",
  "reasoning": {
    "effort": "minimal"
  }

}'

then add

"previous_response_id": "previousresponse.id"

call the api multiple times changing the `previous_response_id` variable to the latest response.id

Topic		Replies	Views
Prompt_cache_key seems inconsistent -- works better on GPT-4o than GPT-5 API api	0	231	October 13, 2025
Is this a problem with cached tokens? API gpt-4 , prompt-caching	3	1451	October 10, 2024
Prompt caching broken for GPT 5.4 and 5.5 Bugs	4	214	June 13, 2026
Prompt Caching Not Working for GPT-5.4-Nano Bugs api	1	154	May 24, 2026
Gpt-5.4-nano appears to return zero prompt-cache hits despite >1024-token shared prefixes API bugs	1	226	April 20, 2026

Issues with gpt-5 caching

GPT-5 prompt caching is inconsistent across identical requests

Related topics