Non-deterministic caching of vision inputs

I am using the chat.completions API with a sequence of images in the context. When looking at the result.usage.prompt_tokens_details, I am finding that the cache hit ratio is much lower than I’d expect. In the simplest repro, running inference on the exact same input images multiple times will sometimes get a cache hit and sometimes not. I believe that the issue is some sort of non-determinism in the image processing (perhaps the resizing or normalization). Have others seen this issue?

1 Like