No Caching with model Responses

cfedele76 · August 8, 2025, 2:14pm

Hi all, I recently moved my script from Chat Completions (with GPT-4o) to Responses calls (with GPT-4.1 and now GPT-5).

With Chat Completions, prompt caching worked perfectly - with my code noting also the input cached tokens every call.

I used both system and user messages - the first one being a 14k token fixed part that was cached constantly.

When I moved to Responses, I included all the system instructions (the fixed string of about 14k tokens) into instructions parameter, leaving the conversation between user and assistant into input (I manage the conversation state manually).

I’m using a fixed prompt_cache_key parameter for this script.

I pass always the same json_schema for the output.

Session_id (for Metadata logging) changes from chat to chat, but it’s fixed into a single chat.

Example of payload of my response call:

payload = {
    "model": GPT_MODEL,
    "reasoning": {"effort": GPT_REASONING_EFFORT},
    "instructions": instructions,
    "input": messages,
    "metadata": {
        "session_id": session_id
    },
    "store": True,
    "prompt_cache_key": prompt_cache_key,
    "text": {
        "format": {
            "type": "json_schema",
            "name": schema_name,
            "strict": True,
            "schema": schema
        }
    }
}

With this setup I cannot manage to push into the prompt cache server any input token, when I know perfectly that the instructions parameter Is long enough to trigger the cache, being Always the same.

I verified it both from cached_tokens = 0 and Dashboard Usage statistics- only Input and Output tokens used…

Can someone give me some advice to correct this behavior?

Thank you!

C

aprendendo.next · August 8, 2025, 10:17pm

Try using a system role instead of instructions.

Instructions is short lived (is only valid for the single request) and doesn’t persist in the conversation with previous_response_id.

Note that the instructions parameter only applies to the current response generation request. If you are managing conversation state with the previous_response_id parameter, the instructions used on previous turns will not be present in the context.

cfedele76 · August 15, 2025, 4:53am

Thank you for the suggestion, it solved my problem!

Bye

C

Topic		Replies	Views
Cannot get the client to cache system instructions (100k+ tokens) API	14	484	June 2, 2025
Caching rate drop after switching to Responses API API cache	3	174	January 11, 2026
Problem caching system prompt API api , gpt-5	4	741	August 17, 2025
Following Instructions Quality - Developer message and its position, instructions and prompt caching API chatgpt , prompt , prompt-caching , gpt-41	1	171	September 16, 2025
Prompt caching with multiple agents API	1	1181	October 9, 2024

No Caching with model Responses

Related topics