I am facing issue with Azure Open AI.
I am using OpenAI model version, gpt-4o-mini-2024-07-18
, and Azure API version, 2024-10-21
.
According to Azure’s documentation, Both OpenAI model and API versions are eligible for caching.
My flow is a System Prompt (static) and a user prompt (dynamic). System prompt is about 2000 tokens, so caching should be used.
For over 50+ calls (not all concurrent), with OpenAI API, i got about 70% cached tokens, whereas on Azure’s OpenAI API, i got only about 0.1% cached tokens. Is this an issue and is anyone also encountering this?