Caching rate drop after switching to Responses API

Recently switched app backend to Responses API and migrated from gpt-5 to gpt-5.1.
Seeing cache rate (% cached input tokens) dropping from 40% to 0% after this change.

What am I doing wrong? Does caching work differently for Responses API vs Chat Completions API?

Hi folks,

Just to update the thread in case it’s helpful to someone else. The caching issue was actually a bug related to the prompt.

In the prompt, I used to have the date as follows at the end of the system prompt:

- Today's date is Thu, 8 Jan 2026

A recent release changed this to include the time of day (to help the AI personalise the greeting and time-of-day references better in the response):

- Today’s date is Thu, 8 Jan 2026, 00:51:42.

This of course caused all the cache to get invalidated because now the prefix was perfectly unique with every prompt request due to including the time.

I addressed this by injecting a higher-level reference to the time of day like this:

- Today’s date is Thu, 8 Jan 2026, Evening

And now everything works fine.

2 Likes

OpenAI is already injecting today’s date into these models on the API along with other unwanted nannying and “we know best” system message before your own, often false information for your own user’s location and false information about your API application.

Be sure to frame your new information about date in a non-conflicting information-building manner. “True user locale date {time}{UTC offset} - always converse using this date”

You can place a “developer message” as an immediate pre-prompt to user message, then you only lose one turn of cache, or no cache hit if you keep repeating it without history removal. The gpt-5.x AI model is already misusing and repeating back dates excessively, so this should only be employed when as important as scheduling “tomorrow” must be correct.

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.