I’m a bit confused. On the prompt caching page, there is no mentioned of o3-mini, however I have seen a post in forum which confirms that prompt caching is enabled for o3-mini.
Further more, I’m using Response SDK in javascript (v4.88.0) and I m consistently seeing cache hits as 0 tokens.
I confirmed from Open AI API logs and diff checker tool that first ~3000 tokens are exactly same between 2 runs.
I think prompt caching is currently having issues even with models that are explicitly mentioned on that page (e.g. gpt-4o-mini): 4o input not being cached - #40 by bento
Thus, one must assume that it is supported and advertised as such, and broad failure to deliver on activating discounts on compliant repetitive inputs should be considered an overbilling while such pricing is being shown.