I have a fixed system prompt of 436 tokens. And dynamic user prompt between 1840-2100 tokens. To test, I made three request to API and all three returns with zero cached_tokens.
What’s the reason?
The total input prompt is more than 1024 tokens, so it should have cached the tokens atleast for two of the requests.
Why it’s not doing that? Is there any specific format which the requests should follow?
Thanks
I didn’t get it work either, but some people in the community did.
the original announcement was here: Prompt caching (automatic!) (where some claimed to get it working)
not sure if this helps, but if you do get it working, and if you can, please report back on what you were getting wrong - I haven’t found the time to figure this out!
1 Like
Prompt caching has worked for me but only in cases where the fixed part of the prompt was at least 1024 tokens.
1 Like