GPT-5 prompt caching is inconsistent across identical requests
Identical requests to the GPT-5 API show inconsistent prompt caching. On the first call usage.input_tokens_details.cached_tokens is 0, the second call shows roughly half of total input tokens cached, and subsequent calls fluctuate. Sometimes no caching occurs; other times more tokens are cached than in the previous call. usage.input_tokens does not consistently reflect the expected reduction from caching.
How to reproduce:
Call the api once:
curl --request POST \
--url https://api.openai.com/v1/responses \
--header "Authorization: Bearer $OPENAI_API_KEY" \
--header 'Content-type: application/json' \
--data '{
"model": "gpt-5",
"input": "(Text exceeding 1024 tokens to allow caching)",
"reasoning": {
"effort": "minimal"
}
}'
then add
"previous_response_id": "previousresponse.id"
call the api multiple times changing the `previous_response_id` variable to the latest response.id