Hello everyone,
I wanna report something I been testing these last days, because maybe is helpful for other devs too.
I made a big batch of tests (around 30 runs) using different models: gpt-5.1, gpt-5-nano, and gpt-5-mini.
And right now, only GPT-5.1 is showing consistent caching behavior.
When I tested GPT-5-mini and GPT-5-nano, the cache almost never hits.
For example:
I tried 20 prompts with exact same system input, and only 2 requests got cached successfully.
The rest came with cached_tokens = 0 or just looked like the model didn’t even try to use cache at all.
I was expecting these smaller models to use caching a lot more, since the announcement said they should be more optimized. But in real practice, looks like something is not working correct.
Not sure if this is a general bug, or if it’s just happening for some users.
If anyone else is facing same issue, would be nice to know.
Also, for people who are NOT having this issue:
did you guys change something on your API config, or add some parameter that makes cache hit more often?
Just trying to understand if I’m missing something on my setup.
And if @OpenAI_Support team can check this behavior, it would help a lot. Because for production apps, the cache is super important (especially for nano/mini where cost and speed matters).