I am still not clear from the documentation how does the prompt caching matching work.
Scenario 1: I have a prompt which has system_prompt + user_1_message, system_prompt + user_2_message, system_prompt + user_3_message.
Assuming my system prompt is 980 characters, and my user_messages are about 400 tokens each.
In that case, the caching will kick off because we have more than 1024 tokens, but will something be actually cached since we have less than 1024 tokens which are common?
It also mentions, that it matches a prefix. What is the length of the prefix until which it matches?
Scenario 2: Suppose, I have system_prompts, system_prompt_1 and system_prompt_2, both of which are 1200 tokens, and have the first 600 tokens exactly same.
- Will anything be cached in this case?
- If yes, would there be two entries in the cache? If yes, while selecting the data, what will be the prefix length that would be matched.
Would want to know the details for these so that we can build accordingly.