Prompt caching not working even with fixed system prompt

daniyal.hashmi214 · October 14, 2024, 8:14am

I have a fixed system prompt of 436 tokens. And dynamic user prompt between 1840-2100 tokens. To test, I made three request to API and all three returns with zero cached_tokens.
What’s the reason?
The total input prompt is more than 1024 tokens, so it should have cached the tokens atleast for two of the requests.
Why it’s not doing that? Is there any specific format which the requests should follow?
Thanks

anon25271712 · October 14, 2024, 8:40am

I didn’t get it work either, but some people in the community did.

the original announcement was here: Prompt caching (automatic!) (where some claimed to get it working)

not sure if this helps, but if you do get it working, and if you can, please report back on what you were getting wrong - I haven’t found the time to figure this out!

jr.2509 · October 14, 2024, 9:12am

Prompt caching has worked for me but only in cases where the fixed part of the prompt was at least 1024 tokens.

Topic		Replies	Views
Is this a problem with cached tokens? API gpt-4 , prompt-caching	3	773	October 10, 2024
How Prompt caching works? API assistants-api , prompt-caching	17	4161	February 4, 2025
Prompt caching with Streaming Documentation prompt , api-usage , gpt-4o-mini	2	308	October 2, 2024
How does th Prompt Caching Prefix Match work? API prompt-caching	1	143	October 22, 2024
Prompt caching with multiple agents API	1	373	October 9, 2024

Prompt caching not working even with fixed system prompt

Related topics