we could make great use of this feature but our prompts are only 700 tokens. is there any way to enable the feature manually?
Hi @leeflix and welcome to the community!
There is no way to enable caching manually for <1024 tokens. For reference, Gemini (Google) requires minimum 32k tokens before cache is activated, and using Anthropic you need 1k minimum for Opus and 2k minimum for Haiku.
So I would say there is some optimization threshold here that is tuned, so for less number of tokens the latency payoff is negligible.
1 Like