How Prompt caching works?

dilshantanki · February 4, 2025, 8:45am

I am facing issue with Azure Open AI.
I am using OpenAI model version, gpt-4o-mini-2024-07-18, and Azure API version, 2024-10-21.
According to Azure’s documentation, Both OpenAI model and API versions are eligible for caching.
My flow is a System Prompt (static) and a user prompt (dynamic). System prompt is about 2000 tokens, so caching should be used.
For over 50+ calls (not all concurrent), with OpenAI API, i got about 70% cached tokens, whereas on Azure’s OpenAI API, i got only about 0.1% cached tokens. Is this an issue and is anyone also encountering this?

Topic		Replies	Views
How does Prompt Caching work? Prompting api , prompt-caching	8	2107	October 29, 2024
Prompt caching with multiple agents API	1	407	October 9, 2024
Cache not caching more than 1024 tokens (expected: increments of 128 tokens) Bugs prompt-caching	6	169	November 14, 2024
Is this a problem with cached tokens? API gpt-4 , prompt-caching	3	826	October 10, 2024
Prompt caching (automatic!) Announcements	19	3005	October 9, 2024

How Prompt caching works?

Related topics