How does th Prompt Caching Prefix Match work?

I am still not clear from the documentation how does the prompt caching matching work.

Scenario 1: I have a prompt which has system_prompt + user_1_message, system_prompt + user_2_message, system_prompt + user_3_message.
Assuming my system prompt is 980 characters, and my user_messages are about 400 tokens each.
In that case, the caching will kick off because we have more than 1024 tokens, but will something be actually cached since we have less than 1024 tokens which are common?

It also mentions, that it matches a prefix. What is the length of the prefix until which it matches?

Scenario 2: Suppose, I have system_prompts, system_prompt_1 and system_prompt_2, both of which are 1200 tokens, and have the first 600 tokens exactly same.

  1. Will anything be cached in this case?
  2. If yes, would there be two entries in the cache? If yes, while selecting the data, what will be the prefix length that would be matched.

Would want to know the details for these so that we can build accordingly.

Hi yashwantk,

Welcome to the forum :slight_smile:

Scenario 1: You must have at least 1024 consecutive tokens the same so no

UNLESS 980 + 400 where the first 44 of the user_messages is the same

Scenario 2: You must have at least 1024 consecutive tokens the same so no
(Cache Fail at 601 of System)

Also it is the matching prefix so if the first character is different and the rest the same you have a miss

It is System+User (First 1024 combined)

“Cache hits are only possible for exact prefix matches within a prompt.”
https://platform.openai.com/docs/guides/prompt-caching