Prompt_tokens_detail is always return None

I wanted to confirmed cached_tokens in prompt_tokens_detail. However, it is always return None. Fixed system prompt is more than 2000 tokens. So, I think it should be cached, but I could not confirmed it.

I am using async_client.beta.chat.completions.parse to get response. I’m using structured output.
Model is gpt4o-2024-08-06 model on Azure OpenAI.

Does somebody have an idea?

Follow the Azure instructions, including API version in network request. You may need to redeploy to a supported datacenter.

Libraries you are using must capture this usage field, by being up-to-date with current methods.

Thank you for your reply.
Yes, I’ve checked model version and API version. Those requirements are all satisfied.

You may need to redeploy to a supported datacenter.

This, do you mean that even model and API versions are satisfied requirements, but some datacenter are not supported, is it right?

I, rather, suspect that with Azure you get a snapshot frozen in time at the time of deployment. You don’t have arbitrary changes to AI models damaging your application.
Thus redeployment, researching where models can be deployed with features from the grid, should be the next avenue for you to pursue to get your working cache on api calls - when repeated in a short time window with nothing changing within the first 1024+ tokens.