Hey, I saw many discussions about prompt caching, but when I try to use a common prefix for a better cache hit, it seems to fail every time.
Not quite sure what I am doing wrong, below are two cost insights of gpt-5 api response
I appended the same context about 7000 tokens in the prefix of the system prompt
I have also used diff checker for checking the difference, it shows that I have a very long common prefix in the first system prompt message.
I have also tried to use different models like gpt-4.1, gpt-4.1-mini, gpt-5, none of them hit the cache.
I think I must be doing something wrong. Does anyone have same issue or suggestion to this problem?


