Issues with gpt-5 caching

GPT-5 prompt caching is inconsistent across identical requests

Identical requests to the GPT-5 API show inconsistent prompt caching. On the first call usage.input_tokens_details.cached_tokens is 0, the second call shows roughly half of total input tokens cached, and subsequent calls fluctuate. Sometimes no caching occurs; other times more tokens are cached than in the previous call. usage.input_tokens does not consistently reflect the expected reduction from caching.

How to reproduce:

Call the api once:

curl --request POST \
--url https://api.openai.com/v1/responses \
--header "Authorization: Bearer $OPENAI_API_KEY" \
--header 'Content-type: application/json' \
--data '{
  "model": "gpt-5",
  "input": "(Text exceeding 1024 tokens to allow caching)",
  "reasoning": {
    "effort": "minimal"
  }

}'

then add

"previous_response_id": "previousresponse.id"

call the api multiple times changing the `previous_response_id` variable to the latest response.id

1 Like