Input cache not registering with fine-tuned gpt-4o

Cemal_Yilmaz · October 15, 2025, 4:46pm

Hi,

I’m trying to get my fine-tuned model on the platform to use input caching, but even though I have a static, and long system prompt (around 1200 tokens), it doesn’t use caching and I have to pay the full monies unnecessarily. I’m limiting RPM under 15, since that’s why it recommends, but even with 10 secs between each query, caching still doesn’t work.

And the documentation is really vauge, it talks about a prompt_key_cache parameter to increase changes of cache hit, but it doesn’t take that argument anywhere when I’m using structured outputs parsed with Pydantic & responses api.

Is the documentation intentionally vague? I ran across multiple threads but none of them had a conclusive answer, so wanted to open a new topic.

Interestingly, if I use the regular gpt-4o instead of my fine-tuned model, caching works correctly. Is this not available with fine-tuned models? If so, why doesn’t it say so anywhere in the documentation?

Thanks for your help!

Topic		Replies	Views
Prompt caching not working API prompt-caching	10	1710	March 4, 2026
Prompt Caching seems not working even if long common prefix in the system prompt API prompt , gpt-5 , prompt-caching , gpt-41 , gpt-41-mini	3	611	September 23, 2025
Prompt caching with multiple agents API	1	1314	October 9, 2024
Prompt_cache_key seems inconsistent -- works better on GPT-4o than GPT-5 API api	0	238	October 13, 2025
Why don't we have prompt caching on gpt-4? Feedback gpt-4 , prompt-caching	1	223	November 22, 2024

Input cache not registering with fine-tuned gpt-4o

Related topics